Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ Alpha-Instruct is our latest language model, developed using 'Evolutionary Model
|
|
| 20 |
- [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
|
| 21 |
- [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
|
| 22 |
|
| 23 |
-
To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO]
|
| 24 |
- [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
|
| 25 |
- [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
|
| 26 |
- [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
|
|
|
|
| 20 |
- [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
|
| 21 |
- [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
|
| 22 |
|
| 23 |
+
To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO](https://arxiv.org/abs/2403.07691) specifically for this "healing" (RLHF) phase. The datasets* used include:
|
| 24 |
- [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
|
| 25 |
- [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
|
| 26 |
- [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
|