---
base_model: princeton-nlp/Llama-3-Base-8B-SFT
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: llama3-8b-mypo3_sim-full-beta12.5-lr4e-7
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama3-8b-mypo3_sim-full-beta12.5-lr4e-7

This model is a fine-tuned version of [princeton-nlp/Llama-3-Base-8B-SFT](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3762
- Rewards/chosen: 0.0655
- Rewards/rejected: -0.3701
- Rewards/accuracies: 0.7560
- Rewards/margins: 0.4356
- Logps/rejected: -1.5190
- Logps/chosen: -1.2659
- Logits/rejected: -1.1037
- Logits/chosen: -1.0759

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 4e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.379         | 0.0523 | 100  | 1.3804          | -0.0143        | -0.0874          | 0.6448             | 0.0731          | -1.4964        | -1.2723      | -1.0455         | -1.0137       |
| 1.3997        | 0.1047 | 200  | 1.4037          | -0.1231        | -0.3189          | 0.7024             | 0.1958          | -1.5149        | -1.2810      | -1.0477         | -1.0177       |
| 1.4069        | 0.1570 | 300  | 1.4016          | 0.1112         | -0.1817          | 0.7302             | 0.2929          | -1.5039        | -1.2623      | -1.0539         | -1.0256       |
| 1.4067        | 0.2094 | 400  | 1.4060          | 0.0174         | -0.3274          | 0.7202             | 0.3448          | -1.5156        | -1.2698      | -1.0205         | -0.9955       |
| 1.4144        | 0.2617 | 500  | 1.3973          | 0.0997         | -0.3029          | 0.7222             | 0.4026          | -1.5136        | -1.2632      | -1.0800         | -1.0518       |
| 1.4259        | 0.3141 | 600  | 1.4098          | 0.0202         | -0.3600          | 0.7242             | 0.3802          | -1.5182        | -1.2695      | -1.0593         | -1.0335       |
| 1.3595        | 0.3664 | 700  | 1.4119          | 0.0323         | -0.3666          | 0.7222             | 0.3989          | -1.5187        | -1.2686      | -1.0663         | -1.0400       |
| 1.449         | 0.4187 | 800  | 1.4198          | -0.0062        | -0.4193          | 0.7242             | 0.4130          | -1.5230        | -1.2716      | -1.0568         | -1.0320       |
| 1.4411        | 0.4711 | 900  | 1.4068          | 0.0924         | -0.3174          | 0.75               | 0.4098          | -1.5148        | -1.2638      | -1.0695         | -1.0427       |
| 1.379         | 0.5234 | 1000 | 1.3951          | 0.1021         | -0.3451          | 0.7460             | 0.4471          | -1.5170        | -1.2630      | -1.0724         | -1.0471       |
| 1.4269        | 0.5758 | 1100 | 1.4001          | 0.2006         | -0.2040          | 0.7321             | 0.4046          | -1.5057        | -1.2551      | -1.0807         | -1.0548       |
| 1.3973        | 0.6281 | 1200 | 1.3843          | 0.0314         | -0.4097          | 0.7421             | 0.4411          | -1.5222        | -1.2686      | -1.0827         | -1.0560       |
| 1.3629        | 0.6805 | 1300 | 1.3831          | 0.0455         | -0.3913          | 0.7421             | 0.4367          | -1.5207        | -1.2675      | -1.0595         | -1.0347       |
| 1.3587        | 0.7328 | 1400 | 1.3861          | 0.1402         | -0.2996          | 0.7440             | 0.4398          | -1.5134        | -1.2599      | -1.0802         | -1.0539       |
| 1.3972        | 0.7851 | 1500 | 1.3793          | 0.0976         | -0.3469          | 0.7401             | 0.4445          | -1.5172        | -1.2633      | -1.0829         | -1.0565       |
| 1.3762        | 0.8375 | 1600 | 1.3783          | 0.0925         | -0.3479          | 0.7480             | 0.4404          | -1.5172        | -1.2637      | -1.0900         | -1.0631       |
| 1.3757        | 0.8898 | 1700 | 1.3774          | 0.0540         | -0.3880          | 0.7480             | 0.4420          | -1.5204        | -1.2668      | -1.0737         | -1.0482       |
| 1.3685        | 0.9422 | 1800 | 1.3773          | 0.0739         | -0.3636          | 0.7480             | 0.4375          | -1.5185        | -1.2652      | -1.0894         | -1.0627       |
| 1.3649        | 0.9945 | 1900 | 1.3769          | 0.0610         | -0.3706          | 0.7460             | 0.4315          | -1.5191        | -1.2663      | -1.1038         | -1.0760       |


### Framework versions

- Transformers 4.43.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1