Update README.md
Browse files
README.md
CHANGED
@@ -167,9 +167,11 @@ Because they wanted someone who could communicate complex ideas without making a
|
|
167 |
```
|
168 |
|
169 |
### Use case 3: RLHF
|
170 |
-
PairRM has been trained on various high-quality and large-scale
|
171 |
-
|
172 |
-
|
|
|
|
|
173 |
|
174 |
**🔥 Check more details on our example jupyter notebook usage: [`blender_usage.ipynb`](https://github.com/yuchenlin/LLM-Blender/blob/main/blender_usage.ipynb)**
|
175 |
|
@@ -184,7 +186,7 @@ Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LL
|
|
184 |
### Context length
|
185 |
| PairRanker type | Source max length | Candidate max length | Total max length |
|
186 |
|:-----------------:|:-----------------:|----------------------|------------------|
|
187 |
-
| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker)
|
188 |
| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
|
189 |
|
190 |
|
|
|
167 |
```
|
168 |
|
169 |
### Use case 3: RLHF
|
170 |
+
PairRM has been trained on various high-quality and large-scale datasets with human preference annotations
|
171 |
+
and shown great correlation with human preferences with an extremely small model size (0.4B),
|
172 |
+
approching the performance of GPT-4.
|
173 |
+
PairRM will better help the future alignment of LLMs in a more efficient and effective way.
|
174 |
+
With a `blender.compare()` function, you can apply PairRM to popular RLHF toolkits such as [trl](https://huggingface.co/docs/trl/index).
|
175 |
|
176 |
**🔥 Check more details on our example jupyter notebook usage: [`blender_usage.ipynb`](https://github.com/yuchenlin/LLM-Blender/blob/main/blender_usage.ipynb)**
|
177 |
|
|
|
186 |
### Context length
|
187 |
| PairRanker type | Source max length | Candidate max length | Total max length |
|
188 |
|:-----------------:|:-----------------:|----------------------|------------------|
|
189 |
+
| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) (our previous version) | 128 | 128 | 384 |
|
190 |
| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
|
191 |
|
192 |
|