SurgeGlobal
/

OpenBezoar-HH-RLHF-DPO

Text Generation

text-generation-inference

Model card Files Files and versions Community

chansurgeplus commited on Apr 18, 2024

Commit

4e4e66b

·

verified ·

1 Parent(s): 06d2089

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ tags:
 ---
 # Model Card for OpenBezoar-HH-RLHF-DPO
-The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290), on top of [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf).
 ## Model Details
@@ -21,7 +21,7 @@ The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferen
 ### Model Description
-OpenBezoar-HH-RLHF-SFT is an LLM that is built upon the OpenLLaMA 3B v2 architecture. This model has been fine-tuned for human preferences alignment using [DPO](https://arxiv.org/abs/2305.18290). Alignment has been performed on top of the [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model. For more information please refer to our paper.
 ### Model Sources

 ---
 # Model Card for OpenBezoar-HH-RLHF-DPO
+The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using Direct Preference Optimization (DPO), on top of [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf).
 ## Model Details
 ### Model Description
+OpenBezoar-HH-RLHF-SFT is an LLM that is built upon the OpenLLaMA 3B v2 architecture. This model has been fine-tuned for human preferences alignment using DPO. Alignment has been performed on top of the [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model. For more information please refer to our paper.
 ### Model Sources