Update README.md
Browse files
README.md
CHANGED
@@ -8,8 +8,6 @@ model-index:
|
|
8 |
results: []
|
9 |
---
|
10 |
|
11 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
-
should probably proofread and complete it, then remove this comment. -->
|
13 |
|
14 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
15 |
<details><summary>See axolotl config</summary>
|
@@ -107,19 +105,23 @@ special_tokens:
|
|
107 |
|
108 |
# qlora-yi-6b-200k-rawrr-run2
|
109 |
|
110 |
-
This model was trained
|
111 |
|
112 |
## Model description
|
113 |
|
114 |
-
|
|
|
|
|
|
|
|
|
115 |
|
116 |
## Intended uses & limitations
|
117 |
|
118 |
-
|
119 |
|
120 |
## Training and evaluation data
|
121 |
|
122 |
-
|
123 |
|
124 |
## Training procedure
|
125 |
|
@@ -139,7 +141,7 @@ The following hyperparameters were used during training:
|
|
139 |
|
140 |
### Training results
|
141 |
|
142 |
-
|
143 |
|
144 |
### Framework versions
|
145 |
|
|
|
8 |
results: []
|
9 |
---
|
10 |
|
|
|
|
|
11 |
|
12 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
13 |
<details><summary>See axolotl config</summary>
|
|
|
105 |
|
106 |
# qlora-yi-6b-200k-rawrr-run2
|
107 |
|
108 |
+
This model was trained using DPO from base Yi-6B-200K
|
109 |
|
110 |
## Model description
|
111 |
|
112 |
+
I figured I will be sharing my attempts as I go. \
|
113 |
+
It's my third DPO attempt and the first one where outputs aren't completely garbled up.
|
114 |
+
Learnings:
|
115 |
+
- DPO needs about 10-100 lower learning rate than SFT fine-tuning.
|
116 |
+
- You probably don't want to add special tokens in your DPO, who could have guessed...
|
117 |
|
118 |
## Intended uses & limitations
|
119 |
|
120 |
+
I intend to run aezakmi SFT finetune on this base, let's see whether this is any better than normal AEZAKMI.
|
121 |
|
122 |
## Training and evaluation data
|
123 |
|
124 |
+
DPO on synthetic dataset rawrr_v1.
|
125 |
|
126 |
## Training procedure
|
127 |
|
|
|
141 |
|
142 |
### Training results
|
143 |
|
144 |
+
Outputs are actually somewhat coherent!
|
145 |
|
146 |
### Framework versions
|
147 |
|