llm-blender
/

PairRM

Text Generation

Inference Endpoints

Model card Files Files and versions Community

yuchenlin commited on Nov 23, 2023

Commit

02215a3

•

1 Parent(s): e391c3f

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -167,9 +167,11 @@ Because they wanted someone who could communicate complex ideas without making a
 ```
 ### Use case 3: RLHF
-PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences with an extremly small model size (0.4B), approching the performance of GPT-4.
-We believe PairRM will power the alignment of LLM in an efficient and effective way.
-With a `blender.compare()` function, you can easily apply PairRM to poopular RLHF toolkits like [trl](https://huggingface.co/docs/trl/index).
 **🔥 Check more details on our example jupyter notebook usage: [`blender_usage.ipynb`](https://github.com/yuchenlin/LLM-Blender/blob/main/blender_usage.ipynb)**
@@ -184,7 +186,7 @@ Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LL
 ### Context length
 |  PairRanker type  | Source max length | Candidate max length | Total max length |
 |:-----------------:|:-----------------:|----------------------|------------------|
-| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker)               | 128               | 128                  | 384              |
 | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224              | 412                  | 2048             |

 ```
 ### Use case 3: RLHF
+PairRM has been trained on various high-quality and large-scale datasets with human preference annotations
+and shown great correlation with human preferences with an extremely small model size (0.4B),
+approching the performance of GPT-4.
+PairRM will better help the future alignment of LLMs in a more efficient and effective way.
+With a `blender.compare()` function, you can apply PairRM to popular RLHF toolkits such as [trl](https://huggingface.co/docs/trl/index).
 **🔥 Check more details on our example jupyter notebook usage: [`blender_usage.ipynb`](https://github.com/yuchenlin/LLM-Blender/blob/main/blender_usage.ipynb)**
 ### Context length
 |  PairRanker type  | Source max length | Candidate max length | Total max length |
 |:-----------------:|:-----------------:|----------------------|------------------|
+| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker)  (our previous version)             | 128               | 128                  | 384              |
 | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224              | 412                  | 2048             |