cgifbribcgfbi commited on
Commit
aa73316
·
verified ·
1 Parent(s): 311fad9

Model save

Browse files
Files changed (1) hide show
  1. README.md +15 -6
README.md CHANGED
@@ -35,7 +35,7 @@ strict: false
35
  datasets:
36
  - path: dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl
37
  type: chat_template
38
- split: train
39
 
40
  dataset_prepared_path: last_run_prepared
41
  val_set_size: 0.04
@@ -60,7 +60,7 @@ wandb_log_model:
60
 
61
  gradient_accumulation_steps: 1
62
  micro_batch_size: 4 # This will be automatically adjusted based on available GPU memory
63
- num_epochs: 1
64
  optimizer: adamw_torch_fused
65
  lr_scheduler: cosine
66
  learning_rate: 0.00002
@@ -103,7 +103,7 @@ special_tokens:
103
 
104
  This model is a fine-tuned version of [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) on the dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl dataset.
105
  It achieves the following results on the evaluation set:
106
- - Loss: 0.5731
107
 
108
  ## Model description
109
 
@@ -133,15 +133,24 @@ The following hyperparameters were used during training:
133
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
134
  - lr_scheduler_type: cosine
135
  - lr_scheduler_warmup_steps: 10
136
- - num_epochs: 1.0
137
 
138
  ### Training results
139
 
140
  | Training Loss | Epoch | Step | Validation Loss |
141
  |:-------------:|:------:|:----:|:---------------:|
142
  | 0.7 | 0.0061 | 1 | 0.8766 |
143
- | 0.6465 | 0.3354 | 55 | 0.6349 |
144
- | 0.5865 | 0.6707 | 110 | 0.5731 |
 
 
 
 
 
 
 
 
 
145
 
146
 
147
  ### Framework versions
 
35
  datasets:
36
  - path: dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl
37
  type: chat_template
38
+ field_messages: messages
39
 
40
  dataset_prepared_path: last_run_prepared
41
  val_set_size: 0.04
 
60
 
61
  gradient_accumulation_steps: 1
62
  micro_batch_size: 4 # This will be automatically adjusted based on available GPU memory
63
+ num_epochs: 4
64
  optimizer: adamw_torch_fused
65
  lr_scheduler: cosine
66
  learning_rate: 0.00002
 
103
 
104
  This model is a fine-tuned version of [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) on the dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl dataset.
105
  It achieves the following results on the evaluation set:
106
+ - Loss: 0.4583
107
 
108
  ## Model description
109
 
 
133
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
134
  - lr_scheduler_type: cosine
135
  - lr_scheduler_warmup_steps: 10
136
+ - num_epochs: 4.0
137
 
138
  ### Training results
139
 
140
  | Training Loss | Epoch | Step | Validation Loss |
141
  |:-------------:|:------:|:----:|:---------------:|
142
  | 0.7 | 0.0061 | 1 | 0.8766 |
143
+ | 0.6414 | 0.3354 | 55 | 0.6293 |
144
+ | 0.5608 | 0.6707 | 110 | 0.5473 |
145
+ | 0.4733 | 1.0061 | 165 | 0.5161 |
146
+ | 0.5142 | 1.3415 | 220 | 0.4954 |
147
+ | 0.4771 | 1.6768 | 275 | 0.4824 |
148
+ | 0.423 | 2.0122 | 330 | 0.4750 |
149
+ | 0.4375 | 2.3476 | 385 | 0.4676 |
150
+ | 0.4311 | 2.6829 | 440 | 0.4630 |
151
+ | 0.4019 | 3.0183 | 495 | 0.4620 |
152
+ | 0.4726 | 3.3537 | 550 | 0.4589 |
153
+ | 0.4677 | 3.6890 | 605 | 0.4583 |
154
 
155
 
156
  ### Framework versions