ShangZhu-Together commited on
Commit
1b00cdb
·
verified ·
1 Parent(s): 1be467e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,7 +8,7 @@ tags: []
8
 
9
  ## Model Description
10
 
11
- This is the DPO model model in our Mixture of Agents Alignment (MoAA) pipeline. This model is tuned on the Llama-3.1-8b-Instruct. MoAA is an approach that leverages collective intelligence from open‑source LLMs to advance alignment.
12
 
13
  Two mains stages are involved in our MoAA method. In the first stage, we employ MoA to produce high-quality synthetic data for supervised fine-tuning. In the second stage, we combines multiple LLMs as a reward model to provide preference annotations.
14
 
@@ -60,7 +60,7 @@ Refer to [Paper](https://arxiv.org/abs/2505.03059) for metrics.
60
 
61
 
62
 
63
- ## Citation [optional]
64
  ```
65
  @article{wang2025improving,
66
  title = {Improving Model Alignment Through Collective Intelligence of Open-Source LLMS},
 
8
 
9
  ## Model Description
10
 
11
+ This is the DPO model in our Mixture of Agents Alignment (MoAA) pipeline. This model is tuned on the Llama-3.1-8b-Instruct. MoAA is an approach that leverages collective intelligence from open‑source LLMs to advance alignment.
12
 
13
  Two mains stages are involved in our MoAA method. In the first stage, we employ MoA to produce high-quality synthetic data for supervised fine-tuning. In the second stage, we combines multiple LLMs as a reward model to provide preference annotations.
14
 
 
60
 
61
 
62
 
63
+ ## Citation
64
  ```
65
  @article{wang2025improving,
66
  title = {Improving Model Alignment Through Collective Intelligence of Open-Source LLMS},