Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags: []
|
|
8 |
|
9 |
## Model Description
|
10 |
|
11 |
-
This is the DPO model
|
12 |
|
13 |
Two mains stages are involved in our MoAA method. In the first stage, we employ MoA to produce high-quality synthetic data for supervised fine-tuning. In the second stage, we combines multiple LLMs as a reward model to provide preference annotations.
|
14 |
|
@@ -60,7 +60,7 @@ Refer to [Paper](https://arxiv.org/abs/2505.03059) for metrics.
|
|
60 |
|
61 |
|
62 |
|
63 |
-
## Citation
|
64 |
```
|
65 |
@article{wang2025improving,
|
66 |
title = {Improving Model Alignment Through Collective Intelligence of Open-Source LLMS},
|
|
|
8 |
|
9 |
## Model Description
|
10 |
|
11 |
+
This is the DPO model in our Mixture of Agents Alignment (MoAA) pipeline. This model is tuned on the Llama-3.1-8b-Instruct. MoAA is an approach that leverages collective intelligence from open‑source LLMs to advance alignment.
|
12 |
|
13 |
Two mains stages are involved in our MoAA method. In the first stage, we employ MoA to produce high-quality synthetic data for supervised fine-tuning. In the second stage, we combines multiple LLMs as a reward model to provide preference annotations.
|
14 |
|
|
|
60 |
|
61 |
|
62 |
|
63 |
+
## Citation
|
64 |
```
|
65 |
@article{wang2025improving,
|
66 |
title = {Improving Model Alignment Through Collective Intelligence of Open-Source LLMS},
|