Training in progress, step 100, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +5 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +20 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +43 -0
last-checkpoint/trainer_state.json +741 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2-0.5B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-0.5B",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "gate_proj",
+    "q_proj",
+    "up_proj",
+    "k_proj",
+    "down_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ff9f6afdf36c3538e0c2127695154fa15ef684b2e05ae607720a8159dabe55b
+size 35237104

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08cb57b39aedef824bde162cc92141b7238130755bae7727f0da8176c6607da2
+size 18810036

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:629963c694d966464851cae9fe2f87acf68234e8ffe0c4f080bc1b28cc5d0f37
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1a5a353aa956358db56e4e8110f1cf929e76f1178ff0441174be62305b48b0e
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bcfe42da0a4497e8b2b172c1f9f4ec423a46dc12907f4349c55025f670422ba9
+size 11418266

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,741 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.03624173235480656,
+  "eval_steps": 250,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0003624173235480656,
+      "grad_norm": 11.182452201843262,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 5.1439,
+      "step": 1
+    },
+    {
+      "epoch": 0.0003624173235480656,
+      "eval_loss": 5.493607521057129,
+      "eval_runtime": 179.3868,
+      "eval_samples_per_second": 6.478,
+      "eval_steps_per_second": 3.239,
+      "step": 1
+    },
+    {
+      "epoch": 0.0007248346470961312,
+      "grad_norm": 13.025100708007812,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 5.0504,
+      "step": 2
+    },
+    {
+      "epoch": 0.0010872519706441968,
+      "grad_norm": 13.570545196533203,
+      "learning_rate": 6e-06,
+      "loss": 5.4292,
+      "step": 3
+    },
+    {
+      "epoch": 0.0014496692941922623,
+      "grad_norm": 15.024659156799316,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 5.4018,
+      "step": 4
+    },
+    {
+      "epoch": 0.001812086617740328,
+      "grad_norm": 16.558658599853516,
+      "learning_rate": 1e-05,
+      "loss": 5.4569,
+      "step": 5
+    },
+    {
+      "epoch": 0.0021745039412883935,
+      "grad_norm": 16.7086181640625,
+      "learning_rate": 1.2e-05,
+      "loss": 6.4055,
+      "step": 6
+    },
+    {
+      "epoch": 0.0025369212648364593,
+      "grad_norm": 12.39970588684082,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 5.6735,
+      "step": 7
+    },
+    {
+      "epoch": 0.0028993385883845247,
+      "grad_norm": 10.910441398620605,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 4.8631,
+      "step": 8
+    },
+    {
+      "epoch": 0.0032617559119325905,
+      "grad_norm": 12.739341735839844,
+      "learning_rate": 1.8e-05,
+      "loss": 5.0609,
+      "step": 9
+    },
+    {
+      "epoch": 0.003624173235480656,
+      "grad_norm": 14.194818496704102,
+      "learning_rate": 2e-05,
+      "loss": 5.217,
+      "step": 10
+    },
+    {
+      "epoch": 0.003986590559028722,
+      "grad_norm": 12.306241989135742,
+      "learning_rate": 2.2000000000000003e-05,
+      "loss": 5.1477,
+      "step": 11
+    },
+    {
+      "epoch": 0.004349007882576787,
+      "grad_norm": 16.202953338623047,
+      "learning_rate": 2.4e-05,
+      "loss": 5.96,
+      "step": 12
+    },
+    {
+      "epoch": 0.004711425206124852,
+      "grad_norm": 15.335165023803711,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 4.2017,
+      "step": 13
+    },
+    {
+      "epoch": 0.005073842529672919,
+      "grad_norm": 16.55255889892578,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 4.4928,
+      "step": 14
+    },
+    {
+      "epoch": 0.005436259853220984,
+      "grad_norm": 15.465089797973633,
+      "learning_rate": 3e-05,
+      "loss": 4.1116,
+      "step": 15
+    },
+    {
+      "epoch": 0.005798677176769049,
+      "grad_norm": 13.577486991882324,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 4.2593,
+      "step": 16
+    },
+    {
+      "epoch": 0.006161094500317115,
+      "grad_norm": 9.916369438171387,
+      "learning_rate": 3.4000000000000007e-05,
+      "loss": 4.0415,
+      "step": 17
+    },
+    {
+      "epoch": 0.006523511823865181,
+      "grad_norm": 13.53633975982666,
+      "learning_rate": 3.6e-05,
+      "loss": 3.5466,
+      "step": 18
+    },
+    {
+      "epoch": 0.006885929147413246,
+      "grad_norm": 9.186629295349121,
+      "learning_rate": 3.8e-05,
+      "loss": 3.8758,
+      "step": 19
+    },
+    {
+      "epoch": 0.007248346470961312,
+      "grad_norm": 9.863125801086426,
+      "learning_rate": 4e-05,
+      "loss": 3.3592,
+      "step": 20
+    },
+    {
+      "epoch": 0.007610763794509377,
+      "grad_norm": 10.771843910217285,
+      "learning_rate": 4.2e-05,
+      "loss": 3.5848,
+      "step": 21
+    },
+    {
+      "epoch": 0.007973181118057443,
+      "grad_norm": 15.665508270263672,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 4.1427,
+      "step": 22
+    },
+    {
+      "epoch": 0.008335598441605509,
+      "grad_norm": 14.37973690032959,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 4.0961,
+      "step": 23
+    },
+    {
+      "epoch": 0.008698015765153574,
+      "grad_norm": 10.397851943969727,
+      "learning_rate": 4.8e-05,
+      "loss": 2.5653,
+      "step": 24
+    },
+    {
+      "epoch": 0.00906043308870164,
+      "grad_norm": 9.413538932800293,
+      "learning_rate": 5e-05,
+      "loss": 3.564,
+      "step": 25
+    },
+    {
+      "epoch": 0.009422850412249705,
+      "grad_norm": 13.236395835876465,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 4.0018,
+      "step": 26
+    },
+    {
+      "epoch": 0.009785267735797772,
+      "grad_norm": 9.422822952270508,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 3.5823,
+      "step": 27
+    },
+    {
+      "epoch": 0.010147685059345837,
+      "grad_norm": 10.403535842895508,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 3.2027,
+      "step": 28
+    },
+    {
+      "epoch": 0.010510102382893903,
+      "grad_norm": 7.67515754699707,
+      "learning_rate": 5.8e-05,
+      "loss": 3.4098,
+      "step": 29
+    },
+    {
+      "epoch": 0.010872519706441968,
+      "grad_norm": 9.647613525390625,
+      "learning_rate": 6e-05,
+      "loss": 3.5071,
+      "step": 30
+    },
+    {
+      "epoch": 0.011234937029990033,
+      "grad_norm": 10.478097915649414,
+      "learning_rate": 6.2e-05,
+      "loss": 3.5463,
+      "step": 31
+    },
+    {
+      "epoch": 0.011597354353538099,
+      "grad_norm": 9.042500495910645,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 3.3199,
+      "step": 32
+    },
+    {
+      "epoch": 0.011959771677086164,
+      "grad_norm": 6.602804660797119,
+      "learning_rate": 6.6e-05,
+      "loss": 2.582,
+      "step": 33
+    },
+    {
+      "epoch": 0.01232218900063423,
+      "grad_norm": 6.278406620025635,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 2.9036,
+      "step": 34
+    },
+    {
+      "epoch": 0.012684606324182297,
+      "grad_norm": 8.419038772583008,
+      "learning_rate": 7e-05,
+      "loss": 3.5126,
+      "step": 35
+    },
+    {
+      "epoch": 0.013047023647730362,
+      "grad_norm": 6.637441158294678,
+      "learning_rate": 7.2e-05,
+      "loss": 3.2632,
+      "step": 36
+    },
+    {
+      "epoch": 0.013409440971278427,
+      "grad_norm": 6.850187301635742,
+      "learning_rate": 7.4e-05,
+      "loss": 3.0642,
+      "step": 37
+    },
+    {
+      "epoch": 0.013771858294826493,
+      "grad_norm": 9.593006134033203,
+      "learning_rate": 7.6e-05,
+      "loss": 2.8089,
+      "step": 38
+    },
+    {
+      "epoch": 0.014134275618374558,
+      "grad_norm": 9.009121894836426,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 3.5287,
+      "step": 39
+    },
+    {
+      "epoch": 0.014496692941922623,
+      "grad_norm": 10.353368759155273,
+      "learning_rate": 8e-05,
+      "loss": 3.2686,
+      "step": 40
+    },
+    {
+      "epoch": 0.014859110265470689,
+      "grad_norm": 6.461148262023926,
+      "learning_rate": 8.2e-05,
+      "loss": 2.6648,
+      "step": 41
+    },
+    {
+      "epoch": 0.015221527589018754,
+      "grad_norm": 6.385082721710205,
+      "learning_rate": 8.4e-05,
+      "loss": 3.0381,
+      "step": 42
+    },
+    {
+      "epoch": 0.015583944912566821,
+      "grad_norm": 6.34943962097168,
+      "learning_rate": 8.6e-05,
+      "loss": 2.153,
+      "step": 43
+    },
+    {
+      "epoch": 0.015946362236114887,
+      "grad_norm": 8.471352577209473,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 3.9041,
+      "step": 44
+    },
+    {
+      "epoch": 0.01630877955966295,
+      "grad_norm": 11.165514945983887,
+      "learning_rate": 9e-05,
+      "loss": 3.4919,
+      "step": 45
+    },
+    {
+      "epoch": 0.016671196883211017,
+      "grad_norm": 8.618705749511719,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 3.817,
+      "step": 46
+    },
+    {
+      "epoch": 0.017033614206759085,
+      "grad_norm": 8.601175308227539,
+      "learning_rate": 9.4e-05,
+      "loss": 3.3323,
+      "step": 47
+    },
+    {
+      "epoch": 0.017396031530307148,
+      "grad_norm": 10.56294059753418,
+      "learning_rate": 9.6e-05,
+      "loss": 2.9041,
+      "step": 48
+    },
+    {
+      "epoch": 0.017758448853855215,
+      "grad_norm": 11.486249923706055,
+      "learning_rate": 9.8e-05,
+      "loss": 2.8437,
+      "step": 49
+    },
+    {
+      "epoch": 0.01812086617740328,
+      "grad_norm": 7.766645908355713,
+      "learning_rate": 0.0001,
+      "loss": 2.8294,
+      "step": 50
+    },
+    {
+      "epoch": 0.018483283500951346,
+      "grad_norm": 8.898940086364746,
+      "learning_rate": 9.999972660400536e-05,
+      "loss": 3.2187,
+      "step": 51
+    },
+    {
+      "epoch": 0.01884570082449941,
+      "grad_norm": 9.619450569152832,
+      "learning_rate": 9.999890641901125e-05,
+      "loss": 3.5205,
+      "step": 52
+    },
+    {
+      "epoch": 0.019208118148047477,
+      "grad_norm": 9.052363395690918,
+      "learning_rate": 9.999753945398704e-05,
+      "loss": 2.9757,
+      "step": 53
+    },
+    {
+      "epoch": 0.019570535471595544,
+      "grad_norm": 11.377963066101074,
+      "learning_rate": 9.99956257238817e-05,
+      "loss": 4.2223,
+      "step": 54
+    },
+    {
+      "epoch": 0.019932952795143608,
+      "grad_norm": 8.070955276489258,
+      "learning_rate": 9.999316524962345e-05,
+      "loss": 3.0801,
+      "step": 55
+    },
+    {
+      "epoch": 0.020295370118691675,
+      "grad_norm": 8.904732704162598,
+      "learning_rate": 9.999015805811965e-05,
+      "loss": 3.6249,
+      "step": 56
+    },
+    {
+      "epoch": 0.020657787442239738,
+      "grad_norm": 7.650445461273193,
+      "learning_rate": 9.998660418225645e-05,
+      "loss": 2.6439,
+      "step": 57
+    },
+    {
+      "epoch": 0.021020204765787805,
+      "grad_norm": 7.753682613372803,
+      "learning_rate": 9.998250366089848e-05,
+      "loss": 2.3483,
+      "step": 58
+    },
+    {
+      "epoch": 0.02138262208933587,
+      "grad_norm": 7.504074573516846,
+      "learning_rate": 9.997785653888835e-05,
+      "loss": 3.1218,
+      "step": 59
+    },
+    {
+      "epoch": 0.021745039412883936,
+      "grad_norm": 7.84480619430542,
+      "learning_rate": 9.997266286704631e-05,
+      "loss": 2.9812,
+      "step": 60
+    },
+    {
+      "epoch": 0.022107456736432,
+      "grad_norm": 7.554739475250244,
+      "learning_rate": 9.996692270216947e-05,
+      "loss": 3.2481,
+      "step": 61
+    },
+    {
+      "epoch": 0.022469874059980067,
+      "grad_norm": 6.9786376953125,
+      "learning_rate": 9.996063610703137e-05,
+      "loss": 3.28,
+      "step": 62
+    },
+    {
+      "epoch": 0.022832291383528134,
+      "grad_norm": 9.193137168884277,
+      "learning_rate": 9.995380315038119e-05,
+      "loss": 3.8813,
+      "step": 63
+    },
+    {
+      "epoch": 0.023194708707076198,
+      "grad_norm": 7.26225471496582,
+      "learning_rate": 9.994642390694308e-05,
+      "loss": 4.0565,
+      "step": 64
+    },
+    {
+      "epoch": 0.023557126030624265,
+      "grad_norm": 8.442785263061523,
+      "learning_rate": 9.993849845741524e-05,
+      "loss": 2.9904,
+      "step": 65
+    },
+    {
+      "epoch": 0.02391954335417233,
+      "grad_norm": 8.306997299194336,
+      "learning_rate": 9.993002688846913e-05,
+      "loss": 4.3562,
+      "step": 66
+    },
+    {
+      "epoch": 0.024281960677720395,
+      "grad_norm": 6.037675857543945,
+      "learning_rate": 9.992100929274846e-05,
+      "loss": 2.8997,
+      "step": 67
+    },
+    {
+      "epoch": 0.02464437800126846,
+      "grad_norm": 7.014623165130615,
+      "learning_rate": 9.991144576886823e-05,
+      "loss": 3.2675,
+      "step": 68
+    },
+    {
+      "epoch": 0.025006795324816526,
+      "grad_norm": 10.0225830078125,
+      "learning_rate": 9.990133642141359e-05,
+      "loss": 3.167,
+      "step": 69
+    },
+    {
+      "epoch": 0.025369212648364593,
+      "grad_norm": 6.834861755371094,
+      "learning_rate": 9.989068136093873e-05,
+      "loss": 3.7237,
+      "step": 70
+    },
+    {
+      "epoch": 0.025731629971912657,
+      "grad_norm": 8.42022705078125,
+      "learning_rate": 9.987948070396571e-05,
+      "loss": 3.6488,
+      "step": 71
+    },
+    {
+      "epoch": 0.026094047295460724,
+      "grad_norm": 8.836771965026855,
+      "learning_rate": 9.986773457298311e-05,
+      "loss": 2.8825,
+      "step": 72
+    },
+    {
+      "epoch": 0.026456464619008788,
+      "grad_norm": 7.350561141967773,
+      "learning_rate": 9.985544309644475e-05,
+      "loss": 2.7139,
+      "step": 73
+    },
+    {
+      "epoch": 0.026818881942556855,
+      "grad_norm": 7.524517059326172,
+      "learning_rate": 9.984260640876821e-05,
+      "loss": 3.0449,
+      "step": 74
+    },
+    {
+      "epoch": 0.02718129926610492,
+      "grad_norm": 7.376155853271484,
+      "learning_rate": 9.98292246503335e-05,
+      "loss": 3.4703,
+      "step": 75
+    },
+    {
+      "epoch": 0.027543716589652985,
+      "grad_norm": 8.42771053314209,
+      "learning_rate": 9.981529796748134e-05,
+      "loss": 3.1457,
+      "step": 76
+    },
+    {
+      "epoch": 0.027906133913201053,
+      "grad_norm": 6.817777156829834,
+      "learning_rate": 9.980082651251175e-05,
+      "loss": 2.8429,
+      "step": 77
+    },
+    {
+      "epoch": 0.028268551236749116,
+      "grad_norm": 9.635848999023438,
+      "learning_rate": 9.97858104436822e-05,
+      "loss": 2.7556,
+      "step": 78
+    },
+    {
+      "epoch": 0.028630968560297183,
+      "grad_norm": 8.276021003723145,
+      "learning_rate": 9.977024992520602e-05,
+      "loss": 2.9938,
+      "step": 79
+    },
+    {
+      "epoch": 0.028993385883845247,
+      "grad_norm": 7.966391086578369,
+      "learning_rate": 9.975414512725057e-05,
+      "loss": 2.5796,
+      "step": 80
+    },
+    {
+      "epoch": 0.029355803207393314,
+      "grad_norm": 7.704254627227783,
+      "learning_rate": 9.973749622593534e-05,
+      "loss": 3.3697,
+      "step": 81
+    },
+    {
+      "epoch": 0.029718220530941378,
+      "grad_norm": 6.093676567077637,
+      "learning_rate": 9.972030340333001e-05,
+      "loss": 2.0882,
+      "step": 82
+    },
+    {
+      "epoch": 0.030080637854489445,
+      "grad_norm": 6.984614372253418,
+      "learning_rate": 9.970256684745258e-05,
+      "loss": 2.9574,
+      "step": 83
+    },
+    {
+      "epoch": 0.03044305517803751,
+      "grad_norm": 8.029525756835938,
+      "learning_rate": 9.968428675226714e-05,
+      "loss": 3.4263,
+      "step": 84
+    },
+    {
+      "epoch": 0.030805472501585576,
+      "grad_norm": 7.764573574066162,
+      "learning_rate": 9.966546331768191e-05,
+      "loss": 2.9489,
+      "step": 85
+    },
+    {
+      "epoch": 0.031167889825133643,
+      "grad_norm": 8.062830924987793,
+      "learning_rate": 9.964609674954696e-05,
+      "loss": 3.4623,
+      "step": 86
+    },
+    {
+      "epoch": 0.031530307148681706,
+      "grad_norm": 8.195585250854492,
+      "learning_rate": 9.962618725965196e-05,
+      "loss": 3.3308,
+      "step": 87
+    },
+    {
+      "epoch": 0.03189272447222977,
+      "grad_norm": 8.735365867614746,
+      "learning_rate": 9.96057350657239e-05,
+      "loss": 3.0993,
+      "step": 88
+    },
+    {
+      "epoch": 0.03225514179577784,
+      "grad_norm": 8.674100875854492,
+      "learning_rate": 9.95847403914247e-05,
+      "loss": 3.565,
+      "step": 89
+    },
+    {
+      "epoch": 0.0326175591193259,
+      "grad_norm": 8.637243270874023,
+      "learning_rate": 9.956320346634876e-05,
+      "loss": 2.3598,
+      "step": 90
+    },
+    {
+      "epoch": 0.03297997644287397,
+      "grad_norm": 6.5536370277404785,
+      "learning_rate": 9.954112452602045e-05,
+      "loss": 2.4113,
+      "step": 91
+    },
+    {
+      "epoch": 0.033342393766422035,
+      "grad_norm": 10.972240447998047,
+      "learning_rate": 9.95185038118915e-05,
+      "loss": 3.7961,
+      "step": 92
+    },
+    {
+      "epoch": 0.0337048110899701,
+      "grad_norm": 6.933289051055908,
+      "learning_rate": 9.949534157133844e-05,
+      "loss": 2.9521,
+      "step": 93
+    },
+    {
+      "epoch": 0.03406722841351817,
+      "grad_norm": 7.79304313659668,
+      "learning_rate": 9.94716380576598e-05,
+      "loss": 2.8418,
+      "step": 94
+    },
+    {
+      "epoch": 0.03442964573706623,
+      "grad_norm": 10.257481575012207,
+      "learning_rate": 9.944739353007344e-05,
+      "loss": 3.3929,
+      "step": 95
+    },
+    {
+      "epoch": 0.034792063060614296,
+      "grad_norm": 14.109910011291504,
+      "learning_rate": 9.942260825371358e-05,
+      "loss": 3.4003,
+      "step": 96
+    },
+    {
+      "epoch": 0.035154480384162363,
+      "grad_norm": 8.312098503112793,
+      "learning_rate": 9.939728249962807e-05,
+      "loss": 2.9074,
+      "step": 97
+    },
+    {
+      "epoch": 0.03551689770771043,
+      "grad_norm": 5.549032688140869,
+      "learning_rate": 9.937141654477528e-05,
+      "loss": 3.2421,
+      "step": 98
+    },
+    {
+      "epoch": 0.03587931503125849,
+      "grad_norm": 7.798683166503906,
+      "learning_rate": 9.934501067202117e-05,
+      "loss": 2.8651,
+      "step": 99
+    },
+    {
+      "epoch": 0.03624173235480656,
+      "grad_norm": 7.199075222015381,
+      "learning_rate": 9.931806517013612e-05,
+      "loss": 3.3106,
+      "step": 100
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7209543008256000.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:694607566dc399b4f369d31b4f66027732b5958cebe96d86b5577d4a72cd3d9b
+size 6712

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff