PocketDoc commited on
Commit
1cb9777
·
verified ·
1 Parent(s): 41280e2

Model save

Browse files
Files changed (1) hide show
  1. README.md +236 -0
README.md ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Dans-DiscountModels/Mistral-Nemo-Base-2407-DanChat
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - Dans-DiscountModels/pretokenization-test-5
10
+ model-index:
11
+ - name: 12b-mn-dans-personality-engine-v1.3.0-TestArticle-1
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
+ <details><summary>See axolotl config</summary>
20
+
21
+ axolotl version: `0.10.0.dev0`
22
+ ```yaml
23
+ base_model: Dans-DiscountModels/Mistral-Nemo-Base-2407-DanChat
24
+ model_type: AutoModelForCausalLM
25
+ tokenizer_type: AutoTokenizer
26
+
27
+ trust_remote_code:
28
+
29
+ # wandb configuration
30
+ wandb_project: 12b-mn-dans-personality-engine
31
+ wandb_watch:
32
+
33
+ wandb_run_id: V1.3.0-1-4 # V{Version}-{Run Number}-{Attempt Number}
34
+ wandb_log_model:
35
+
36
+ # push checkpoints to hub
37
+ hub_model_id: Dans-DiscountModels/12b-mn-dans-personality-engine-v1.3.0-TestArticle-1
38
+ # how to push checkpoints to hub
39
+ # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
40
+ hub_strategy: "every_save"
41
+ # Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
42
+ # Required to be true when used in combination with `push_dataset_to_hub`
43
+ hf_use_auth_token: true
44
+
45
+ # where to save the finished model to
46
+ output_dir: ./12b-mn-dans-personality-engine-v1.3.0
47
+
48
+ # dataset settings (local or huggingface repo)
49
+ datasets:
50
+ - path: Dans-DiscountModels/pretokenization-test-5
51
+ ds_type: parquet
52
+ type:
53
+
54
+ plugins:
55
+ - axolotl.integrations.liger.LigerPlugin
56
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
57
+ liger_rope: true
58
+ liger_rms_norm: true
59
+ liger_layer_norm: true
60
+ liger_glu_activation: true
61
+ liger_fused_linear_cross_entropy: true
62
+ cut_cross_entropy: true
63
+
64
+ load_in_8bit: false
65
+ load_in_4bit: false
66
+ strict: false
67
+
68
+ adapter:
69
+ lora_model_dir:
70
+
71
+ dataset_prepared_path: ./12b-mn-dans-personality-engine-data
72
+ val_set_size: 0.003
73
+
74
+ sequence_len: 32768
75
+
76
+ sample_packing: true
77
+ eval_sample_packing: true
78
+
79
+ pad_to_sequence_len: true
80
+
81
+ gradient_checkpointing: true
82
+
83
+ gradient_accumulation_steps: 2
84
+ micro_batch_size: 2
85
+
86
+ num_epochs: 2
87
+
88
+ optimizer: ademamix_8bit
89
+ optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"
90
+
91
+ lr_scheduler: rex
92
+ learning_rate: 0.00001
93
+ cosine_min_lr_ratio:
94
+
95
+ weight_decay:
96
+
97
+ max_grad_norm: 0.001
98
+
99
+ train_on_inputs: false
100
+ group_by_length: false
101
+
102
+ bf16: true
103
+ fp16: false
104
+ tf32: false
105
+
106
+ early_stopping_patience:
107
+
108
+ resume_from_checkpoint:
109
+ auto_resume_from_checkpoints: true
110
+
111
+ local_rank:
112
+ logging_steps: 1
113
+ xformers_attention:
114
+ flash_attention: true
115
+
116
+ warmup_ratio: 0.1
117
+
118
+ evals_per_epoch: 24
119
+ eval_table_size:
120
+ eval_max_new_tokens:
121
+
122
+ saves_per_epoch: 2
123
+ save_total_limit: 1
124
+
125
+ debug: false
126
+
127
+ deepspeed: deepspeed_configs/zero3_bf16.json
128
+
129
+ fsdp:
130
+ fsdp_config:
131
+
132
+ special_tokens:
133
+
134
+ ```
135
+
136
+ </details><br>
137
+
138
+ # 12b-mn-dans-personality-engine-v1.3.0-TestArticle-1
139
+
140
+ This model is a fine-tuned version of [Dans-DiscountModels/Mistral-Nemo-Base-2407-DanChat](https://huggingface.co/Dans-DiscountModels/Mistral-Nemo-Base-2407-DanChat) on the Dans-DiscountModels/pretokenization-test-5 dataset.
141
+ It achieves the following results on the evaluation set:
142
+ - Loss: 1.4392
143
+
144
+ ## Model description
145
+
146
+ More information needed
147
+
148
+ ## Intended uses & limitations
149
+
150
+ More information needed
151
+
152
+ ## Training and evaluation data
153
+
154
+ More information needed
155
+
156
+ ## Training procedure
157
+
158
+ ### Training hyperparameters
159
+
160
+ The following hyperparameters were used during training:
161
+ - learning_rate: 1e-05
162
+ - train_batch_size: 2
163
+ - eval_batch_size: 2
164
+ - seed: 42
165
+ - distributed_type: multi-GPU
166
+ - num_devices: 8
167
+ - gradient_accumulation_steps: 2
168
+ - total_train_batch_size: 32
169
+ - total_eval_batch_size: 16
170
+ - optimizer: Use ademamix_8bit and the args are:
171
+ beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
172
+ - lr_scheduler_type: cosine
173
+ - lr_scheduler_warmup_steps: 321
174
+ - num_epochs: 2.0
175
+
176
+ ### Training results
177
+
178
+ | Training Loss | Epoch | Step | Validation Loss |
179
+ |:-------------:|:------:|:----:|:---------------:|
180
+ | 1.8086 | 0.0006 | 1 | 1.7459 |
181
+ | 1.593 | 0.0417 | 67 | 1.5911 |
182
+ | 1.5578 | 0.0833 | 134 | 1.5565 |
183
+ | 1.5782 | 0.1250 | 201 | 1.5436 |
184
+ | 1.5702 | 0.1666 | 268 | 1.5377 |
185
+ | 1.5926 | 0.2083 | 335 | 1.5328 |
186
+ | 1.6364 | 0.2499 | 402 | 1.5291 |
187
+ | 1.5082 | 0.2916 | 469 | 1.5234 |
188
+ | 1.6002 | 0.3332 | 536 | 1.5197 |
189
+ | 1.5252 | 0.3749 | 603 | 1.5162 |
190
+ | 1.5915 | 0.4165 | 670 | 1.5121 |
191
+ | 1.5108 | 0.4582 | 737 | 1.5103 |
192
+ | 1.5663 | 0.4998 | 804 | 1.5063 |
193
+ | 1.5085 | 0.5415 | 871 | 1.5037 |
194
+ | 1.4273 | 0.5832 | 938 | 1.5024 |
195
+ | 1.5528 | 0.6248 | 1005 | 1.4994 |
196
+ | 1.6072 | 0.6665 | 1072 | 1.4975 |
197
+ | 1.6074 | 0.7081 | 1139 | 1.4920 |
198
+ | 1.5495 | 0.7498 | 1206 | 1.4904 |
199
+ | 1.6117 | 0.7914 | 1273 | 1.4883 |
200
+ | 1.4621 | 0.8331 | 1340 | 1.4850 |
201
+ | 1.6381 | 0.8747 | 1407 | 1.4838 |
202
+ | 1.4221 | 0.9164 | 1474 | 1.4813 |
203
+ | 1.5812 | 0.9580 | 1541 | 1.4789 |
204
+ | 1.4581 | 0.9997 | 1608 | 1.4750 |
205
+ | 1.4608 | 1.0417 | 1675 | 1.4800 |
206
+ | 1.5261 | 1.0833 | 1742 | 1.4798 |
207
+ | 1.3856 | 1.1250 | 1809 | 1.4796 |
208
+ | 1.4469 | 1.1666 | 1876 | 1.4766 |
209
+ | 1.4783 | 1.2083 | 1943 | 1.4741 |
210
+ | 1.5025 | 1.2499 | 2010 | 1.4733 |
211
+ | 1.4531 | 1.2916 | 2077 | 1.4726 |
212
+ | 1.4719 | 1.3332 | 2144 | 1.4712 |
213
+ | 1.4123 | 1.3749 | 2211 | 1.4700 |
214
+ | 1.4653 | 1.4165 | 2278 | 1.4673 |
215
+ | 1.4571 | 1.4582 | 2345 | 1.4660 |
216
+ | 1.4261 | 1.4998 | 2412 | 1.4660 |
217
+ | 1.3212 | 1.5415 | 2479 | 1.4620 |
218
+ | 1.3828 | 1.5832 | 2546 | 1.4617 |
219
+ | 1.3617 | 1.6248 | 2613 | 1.4597 |
220
+ | 1.4364 | 1.6665 | 2680 | 1.4567 |
221
+ | 1.4686 | 1.7081 | 2747 | 1.4549 |
222
+ | 1.3317 | 1.7498 | 2814 | 1.4530 |
223
+ | 1.3749 | 1.7914 | 2881 | 1.4506 |
224
+ | 1.4116 | 1.8331 | 2948 | 1.4468 |
225
+ | 1.3988 | 1.8747 | 3015 | 1.4456 |
226
+ | 1.2534 | 1.9164 | 3082 | 1.4448 |
227
+ | 1.3564 | 1.9580 | 3149 | 1.4412 |
228
+ | 1.3668 | 1.9997 | 3216 | 1.4392 |
229
+
230
+
231
+ ### Framework versions
232
+
233
+ - Transformers 4.51.3
234
+ - Pytorch 2.4.1+cu121
235
+ - Datasets 3.5.1
236
+ - Tokenizers 0.21.1