glorgao
/

SelectiveDPO-Llama3-8B-SFT-UFBinarized

Text Generation

text-generation-inference

Model card Files Files and versions Community

glorgao commited on May 15

Commit

4345b13

·

verified ·

1 Parent(s): 0611686

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,6 +6,6 @@ base_model:
 - princeton-nlp/Llama-3-Base-8B-SFT
 ---
-This model is fine-tuned from the princeton-nlp/Llama-3-Base-8B-SFT model using the SelectiveDPO algorithm on the Ultrafeedback_binarized dataset.
 For the recipe to reproduce this model, please visit our [GitHub page](https://github.com/glorgao/SelectiveDPO).

 - princeton-nlp/Llama-3-Base-8B-SFT
 ---
+This model is fine-tuned from the princeton-nlp/Llama-3-Base-8B-SFT model using the [SelectiveDPO](https://huggingface.co/papers/2502.09650) on the Ultrafeedback_binarized dataset.
 For the recipe to reproduce this model, please visit our [GitHub page](https://github.com/glorgao/SelectiveDPO).