Model save
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ strict: false
|
|
35 |
datasets:
|
36 |
- path: dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl
|
37 |
type: chat_template
|
38 |
-
|
39 |
|
40 |
dataset_prepared_path: last_run_prepared
|
41 |
val_set_size: 0.04
|
@@ -60,7 +60,7 @@ wandb_log_model:
|
|
60 |
|
61 |
gradient_accumulation_steps: 1
|
62 |
micro_batch_size: 4 # This will be automatically adjusted based on available GPU memory
|
63 |
-
num_epochs:
|
64 |
optimizer: adamw_torch_fused
|
65 |
lr_scheduler: cosine
|
66 |
learning_rate: 0.00002
|
@@ -103,7 +103,7 @@ special_tokens:
|
|
103 |
|
104 |
This model is a fine-tuned version of [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) on the dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl dataset.
|
105 |
It achieves the following results on the evaluation set:
|
106 |
-
- Loss: 0.
|
107 |
|
108 |
## Model description
|
109 |
|
@@ -133,15 +133,24 @@ The following hyperparameters were used during training:
|
|
133 |
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
134 |
- lr_scheduler_type: cosine
|
135 |
- lr_scheduler_warmup_steps: 10
|
136 |
-
- num_epochs:
|
137 |
|
138 |
### Training results
|
139 |
|
140 |
| Training Loss | Epoch | Step | Validation Loss |
|
141 |
|:-------------:|:------:|:----:|:---------------:|
|
142 |
| 0.7 | 0.0061 | 1 | 0.8766 |
|
143 |
-
| 0.
|
144 |
-
| 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
145 |
|
146 |
|
147 |
### Framework versions
|
|
|
35 |
datasets:
|
36 |
- path: dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl
|
37 |
type: chat_template
|
38 |
+
field_messages: messages
|
39 |
|
40 |
dataset_prepared_path: last_run_prepared
|
41 |
val_set_size: 0.04
|
|
|
60 |
|
61 |
gradient_accumulation_steps: 1
|
62 |
micro_batch_size: 4 # This will be automatically adjusted based on available GPU memory
|
63 |
+
num_epochs: 4
|
64 |
optimizer: adamw_torch_fused
|
65 |
lr_scheduler: cosine
|
66 |
learning_rate: 0.00002
|
|
|
103 |
|
104 |
This model is a fine-tuned version of [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) on the dset_comp3.0_sortpatent_count_pat400_in5_5000.jsonl dataset.
|
105 |
It achieves the following results on the evaluation set:
|
106 |
+
- Loss: 0.4583
|
107 |
|
108 |
## Model description
|
109 |
|
|
|
133 |
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
134 |
- lr_scheduler_type: cosine
|
135 |
- lr_scheduler_warmup_steps: 10
|
136 |
+
- num_epochs: 4.0
|
137 |
|
138 |
### Training results
|
139 |
|
140 |
| Training Loss | Epoch | Step | Validation Loss |
|
141 |
|:-------------:|:------:|:----:|:---------------:|
|
142 |
| 0.7 | 0.0061 | 1 | 0.8766 |
|
143 |
+
| 0.6414 | 0.3354 | 55 | 0.6293 |
|
144 |
+
| 0.5608 | 0.6707 | 110 | 0.5473 |
|
145 |
+
| 0.4733 | 1.0061 | 165 | 0.5161 |
|
146 |
+
| 0.5142 | 1.3415 | 220 | 0.4954 |
|
147 |
+
| 0.4771 | 1.6768 | 275 | 0.4824 |
|
148 |
+
| 0.423 | 2.0122 | 330 | 0.4750 |
|
149 |
+
| 0.4375 | 2.3476 | 385 | 0.4676 |
|
150 |
+
| 0.4311 | 2.6829 | 440 | 0.4630 |
|
151 |
+
| 0.4019 | 3.0183 | 495 | 0.4620 |
|
152 |
+
| 0.4726 | 3.3537 | 550 | 0.4589 |
|
153 |
+
| 0.4677 | 3.6890 | 605 | 0.4583 |
|
154 |
|
155 |
|
156 |
### Framework versions
|