adamo1139 commited on
Commit
c3ab3fa
·
1 Parent(s): 56338bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -8,8 +8,6 @@ model-index:
8
  results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
 
14
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
  <details><summary>See axolotl config</summary>
@@ -107,19 +105,23 @@ special_tokens:
107
 
108
  # qlora-yi-6b-200k-rawrr-run2
109
 
110
- This model was trained from scratch on the None dataset.
111
 
112
  ## Model description
113
 
114
- More information needed
 
 
 
 
115
 
116
  ## Intended uses & limitations
117
 
118
- More information needed
119
 
120
  ## Training and evaluation data
121
 
122
- More information needed
123
 
124
  ## Training procedure
125
 
@@ -139,7 +141,7 @@ The following hyperparameters were used during training:
139
 
140
  ### Training results
141
 
142
-
143
 
144
  ### Framework versions
145
 
 
8
  results: []
9
  ---
10
 
 
 
11
 
12
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
13
  <details><summary>See axolotl config</summary>
 
105
 
106
  # qlora-yi-6b-200k-rawrr-run2
107
 
108
+ This model was trained using DPO from base Yi-6B-200K
109
 
110
  ## Model description
111
 
112
+ I figured I will be sharing my attempts as I go. \
113
+ It's my third DPO attempt and the first one where outputs aren't completely garbled up.
114
+ Learnings:
115
+ - DPO needs about 10-100 lower learning rate than SFT fine-tuning.
116
+ - You probably don't want to add special tokens in your DPO, who could have guessed...
117
 
118
  ## Intended uses & limitations
119
 
120
+ I intend to run aezakmi SFT finetune on this base, let's see whether this is any better than normal AEZAKMI.
121
 
122
  ## Training and evaluation data
123
 
124
+ DPO on synthetic dataset rawrr_v1.
125
 
126
  ## Training procedure
127
 
 
141
 
142
  ### Training results
143
 
144
+ Outputs are actually somewhat coherent!
145
 
146
  ### Framework versions
147