SystemAdmin123 commited on
Commit
80ed2fd
·
verified ·
1 Parent(s): 4491653

End of training

Browse files
Files changed (1) hide show
  1. README.md +14 -5
README.md CHANGED
@@ -37,7 +37,7 @@ datasets:
37
  system_prompt: ''
38
  device_map: auto
39
  eval_sample_packing: false
40
- eval_steps: 200
41
  flash_attention: true
42
  gradient_checkpointing: true
43
  group_by_length: true
@@ -55,11 +55,13 @@ output_dir: /root/.sn56/axolotl/tmp/opt-125m
55
  pad_to_sequence_len: true
56
  resize_token_embeddings_to_32x: false
57
  sample_packing: true
58
- save_steps: 200
59
  save_total_limit: 1
60
  sequence_len: 2048
61
  tokenizer_type: GPT2TokenizerFast
62
  torch_dtype: bf16
 
 
63
  trust_remote_code: true
64
  val_set_size: 0.1
65
  wandb_entity: ''
@@ -77,6 +79,8 @@ warmup_ratio: 0.05
77
  # opt-125m
78
 
79
  This model is a fine-tuned version of [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) on the argilla/databricks-dolly-15k-curated-en dataset.
 
 
80
 
81
  ## Model description
82
 
@@ -110,9 +114,14 @@ The following hyperparameters were used during training:
110
 
111
  ### Training results
112
 
113
- | Training Loss | Epoch | Step | Validation Loss |
114
- |:-------------:|:------:|:----:|:---------------:|
115
- | No log | 0.1667 | 1 | 3.2664 |
 
 
 
 
 
116
 
117
 
118
  ### Framework versions
 
37
  system_prompt: ''
38
  device_map: auto
39
  eval_sample_packing: false
40
+ eval_steps: 20
41
  flash_attention: true
42
  gradient_checkpointing: true
43
  group_by_length: true
 
55
  pad_to_sequence_len: true
56
  resize_token_embeddings_to_32x: false
57
  sample_packing: true
58
+ save_steps: 20
59
  save_total_limit: 1
60
  sequence_len: 2048
61
  tokenizer_type: GPT2TokenizerFast
62
  torch_dtype: bf16
63
+ training_args_kwargs:
64
+ hub_private_repo: true
65
  trust_remote_code: true
66
  val_set_size: 0.1
67
  wandb_entity: ''
 
79
  # opt-125m
80
 
81
  This model is a fine-tuned version of [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) on the argilla/databricks-dolly-15k-curated-en dataset.
82
+ It achieves the following results on the evaluation set:
83
+ - Loss: 3.2130
84
 
85
  ## Model description
86
 
 
114
 
115
  ### Training results
116
 
117
+ | Training Loss | Epoch | Step | Validation Loss |
118
+ |:-------------:|:-------:|:----:|:---------------:|
119
+ | No log | 0.1667 | 1 | 3.2664 |
120
+ | 5.5113 | 3.3333 | 20 | 3.2161 |
121
+ | 5.0084 | 6.6667 | 40 | 3.0989 |
122
+ | 4.6384 | 10.0 | 60 | 3.1967 |
123
+ | 4.484 | 13.3333 | 80 | 3.2199 |
124
+ | 4.4609 | 16.6667 | 100 | 3.2130 |
125
 
126
 
127
  ### Framework versions