togethercomputer
/

Llama-3.1-8B-Instruct-MoAA-DPO

Text Generation

text-generation-inference

Model card Files Files and versions

ShangZhu-Together commited on 10 days ago

Commit

1b00cdb

·

verified ·

1 Parent(s): 1be467e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ tags: []
 ## Model Description
-This is the DPO model model in our Mixture of Agents Alignment (MoAA) pipeline. This model is tuned on the Llama-3.1-8b-Instruct. MoAA is an approach that leverages collective intelligence from open‑source LLMs to advance alignment.
 Two mains stages are involved in our MoAA method. In the first stage, we employ MoA  to produce high-quality synthetic data for supervised fine-tuning. In the second stage, we combines multiple LLMs as a reward model to provide preference annotations.
@@ -60,7 +60,7 @@ Refer to [Paper](https://arxiv.org/abs/2505.03059) for metrics.
-## Citation [optional]
 ```
 @article{wang2025improving,
 title   = {Improving Model Alignment Through Collective Intelligence of Open-Source LLMS},

 ## Model Description
+This is the DPO model in our Mixture of Agents Alignment (MoAA) pipeline. This model is tuned on the Llama-3.1-8b-Instruct. MoAA is an approach that leverages collective intelligence from open‑source LLMs to advance alignment.
 Two mains stages are involved in our MoAA method. In the first stage, we employ MoA  to produce high-quality synthetic data for supervised fine-tuning. In the second stage, we combines multiple LLMs as a reward model to provide preference annotations.
+## Citation
 ```
 @article{wang2025improving,
 title   = {Improving Model Alignment Through Collective Intelligence of Open-Source LLMS},