argilla
/

phi2-lora-distilabel-intel-orca-dpo-pairs

Text Generation

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

davidberenstein1957 commited on Jan 25, 2024

Commit

7693f67

·

verified ·

1 Parent(s): e6c1c4a

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -24,6 +24,8 @@ should probably proofread and complete it, then remove this comment. -->
 # phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
 It achieves the following results on the evaluation set:
 - Loss: 0.0972
 - Rewards/chosen: 0.2699

 # phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
+The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
 It achieves the following results on the evaluation set:
 - Loss: 0.0972
 - Rewards/chosen: 0.2699