adamo1139
/

Yi-6B-200K-rawrr1-run2-LORA-DPO-experimental

Generated from Trainer

4-bit precision

Model card Files Files and versions Community

adamo1139 commited on Jan 9, 2024

Commit

c3ab3fa

·

1 Parent(s): 56338bb

Update README.md

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -8,8 +8,6 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 <details><summary>See axolotl config</summary>
@@ -107,19 +105,23 @@ special_tokens:
 # qlora-yi-6b-200k-rawrr-run2
-This model was trained from scratch on the None dataset.
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -139,7 +141,7 @@ The following hyperparameters were used during training:
 ### Training results
 ### Framework versions

   results: []
 ---
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 <details><summary>See axolotl config</summary>
 # qlora-yi-6b-200k-rawrr-run2
+This model was trained using DPO from base Yi-6B-200K
 ## Model description
+I figured I will be sharing my attempts as I go. \
+It's my third DPO attempt and the first one where outputs aren't completely garbled up.
+Learnings:
+- DPO needs about 10-100 lower learning rate than SFT fine-tuning.
+- You probably don't want to add special tokens in your DPO, who could have guessed...
 ## Intended uses & limitations
+I intend to run aezakmi SFT finetune on this base, let's see whether this is any better than normal AEZAKMI.
 ## Training and evaluation data
+DPO on synthetic dataset rawrr_v1.
 ## Training procedure
 ### Training results
+Outputs are actually somewhat coherent!
 ### Framework versions