Text Generation
Transformers
PyTorch
Safetensors
English
llama
text-generation-inference
Inference Endpoints
chansurgeplus commited on
Commit
4e4e66b
·
verified ·
1 Parent(s): 06d2089

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  ---
11
  # Model Card for OpenBezoar-HH-RLHF-DPO
12
 
13
- The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290), on top of [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf).
14
 
15
  ## Model Details
16
 
@@ -21,7 +21,7 @@ The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferen
21
 
22
  ### Model Description
23
 
24
- OpenBezoar-HH-RLHF-SFT is an LLM that is built upon the OpenLLaMA 3B v2 architecture. This model has been fine-tuned for human preferences alignment using [DPO](https://arxiv.org/abs/2305.18290). Alignment has been performed on top of the [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model. For more information please refer to our paper.
25
 
26
  ### Model Sources
27
 
 
10
  ---
11
  # Model Card for OpenBezoar-HH-RLHF-DPO
12
 
13
+ The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using Direct Preference Optimization (DPO), on top of [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf).
14
 
15
  ## Model Details
16
 
 
21
 
22
  ### Model Description
23
 
24
+ OpenBezoar-HH-RLHF-SFT is an LLM that is built upon the OpenLLaMA 3B v2 architecture. This model has been fine-tuned for human preferences alignment using DPO. Alignment has been performed on top of the [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model. For more information please refer to our paper.
25
 
26
  ### Model Sources
27