Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ tags:
|
|
10 |
---
|
11 |
# Model Card for OpenBezoar-HH-RLHF-DPO
|
12 |
|
13 |
-
The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using
|
14 |
|
15 |
## Model Details
|
16 |
|
@@ -21,7 +21,7 @@ The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferen
|
|
21 |
|
22 |
### Model Description
|
23 |
|
24 |
-
OpenBezoar-HH-RLHF-SFT is an LLM that is built upon the OpenLLaMA 3B v2 architecture. This model has been fine-tuned for human preferences alignment using
|
25 |
|
26 |
### Model Sources
|
27 |
|
|
|
10 |
---
|
11 |
# Model Card for OpenBezoar-HH-RLHF-DPO
|
12 |
|
13 |
+
The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using Direct Preference Optimization (DPO), on top of [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf).
|
14 |
|
15 |
## Model Details
|
16 |
|
|
|
21 |
|
22 |
### Model Description
|
23 |
|
24 |
+
OpenBezoar-HH-RLHF-SFT is an LLM that is built upon the OpenLLaMA 3B v2 architecture. This model has been fine-tuned for human preferences alignment using DPO. Alignment has been performed on top of the [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model. For more information please refer to our paper.
|
25 |
|
26 |
### Model Sources
|
27 |
|