Kearm commited on
Commit
a29cf36
·
verified ·
1 Parent(s): 6fdf3d5

Update README.md

Browse files

Actual model card with proper information.

Files changed (1) hide show
  1. README.md +65 -58
README.md CHANGED
@@ -2,15 +2,75 @@
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: Qwen/Qwen2.5-14B
5
- tags:
6
- - generated_from_trainer
7
  model-index:
8
  - name: LLaMutation-Qwen2.5-14B-SFFT-v0.0
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
  <details><summary>See axolotl config</summary>
@@ -120,57 +180,4 @@ weight_decay: 0.1
120
  # fsdp_mixed_precision: BF16 # Added
121
  ```
122
 
123
- </details><br>
124
-
125
- # LLaMutation-Qwen2.5-14B-SFFT-v0.0
126
-
127
- This model is a fine-tuned version of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on the None dataset.
128
- It achieves the following results on the evaluation set:
129
- - Loss: 0.2621
130
-
131
- ## Model description
132
-
133
- More information needed
134
-
135
- ## Intended uses & limitations
136
-
137
- More information needed
138
-
139
- ## Training and evaluation data
140
-
141
- More information needed
142
-
143
- ## Training procedure
144
-
145
- ### Training hyperparameters
146
-
147
- The following hyperparameters were used during training:
148
- - learning_rate: 0.0005
149
- - train_batch_size: 1
150
- - eval_batch_size: 1
151
- - seed: 42
152
- - distributed_type: multi-GPU
153
- - num_devices: 8
154
- - gradient_accumulation_steps: 4
155
- - total_train_batch_size: 32
156
- - total_eval_batch_size: 8
157
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
158
- - lr_scheduler_type: linear
159
- - lr_scheduler_warmup_steps: 50
160
- - num_epochs: 1
161
-
162
- ### Training results
163
-
164
- | Training Loss | Epoch | Step | Validation Loss |
165
- |:-------------:|:------:|:----:|:---------------:|
166
- | 0.3948 | 0.0237 | 1 | 0.3920 |
167
- | 0.2392 | 0.4970 | 21 | 0.2500 |
168
- | 0.2606 | 0.9941 | 42 | 0.2621 |
169
-
170
-
171
- ### Framework versions
172
-
173
- - Transformers 4.45.2
174
- - Pytorch 2.3.1+cu121
175
- - Datasets 3.0.1
176
- - Tokenizers 0.20.1
 
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: Qwen/Qwen2.5-14B
 
 
5
  model-index:
6
  - name: LLaMutation-Qwen2.5-14B-SFFT-v0.0
7
  results: []
8
  ---
9
 
10
+ # LLaMutation-Qwen2.5-14B-SFFT-v0.0
11
+
12
+ ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp)
13
+
14
+ This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit).
15
+
16
+ ## Model description
17
+
18
+ Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters.
19
+
20
+ I will refine the model both for completion and create an instruct/chat variant.
21
+
22
+ ## Intended uses & limitations
23
+
24
+ Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/)
25
+
26
+ ## Chat template and sampling paramaters.
27
+
28
+ Chat template is chatml.
29
+
30
+ Sampling parameters for the generation and demo at the hackathon are here:
31
+
32
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png)
33
+
34
+ ### SYSTEM PROMPT MUST BE USED FOR THIS MODEL
35
+
36
+ `You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!`
37
+
38
+ ## Training procedure
39
+
40
+ Spectrum FFT/SFFT
41
+
42
+ ### Training hyperparameters
43
+
44
+ The following hyperparameters were used during training:
45
+ - learning_rate: 0.0005
46
+ - train_batch_size: 1
47
+ - eval_batch_size: 1
48
+ - seed: 42
49
+ - distributed_type: multi-GPU
50
+ - num_devices: 8
51
+ - gradient_accumulation_steps: 4
52
+ - total_train_batch_size: 32
53
+ - total_eval_batch_size: 8
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: linear
56
+ - lr_scheduler_warmup_steps: 50
57
+ - num_epochs: 1
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss |
62
+ |:-------------:|:------:|:----:|:---------------:|
63
+ | 0.3948 | 0.0237 | 1 | 0.3920 |
64
+ | 0.2392 | 0.4970 | 21 | 0.2500 |
65
+ | 0.2606 | 0.9941 | 42 | 0.2621 |
66
+
67
+
68
+ ### Framework versions
69
+
70
+ - Transformers 4.45.2
71
+ - Pytorch 2.3.1+cu121
72
+ - Datasets 3.0.1
73
+ - Tokenizers 0.20.1
74
 
75
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
76
  <details><summary>See axolotl config</summary>
 
180
  # fsdp_mixed_precision: BF16 # Added
181
  ```
182
 
183
+ </details><br>