llama3-8b-mypo3_sim-full-beta12.5-lr4e-7

This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3762
  • Rewards/chosen: 0.0655
  • Rewards/rejected: -0.3701
  • Rewards/accuracies: 0.7560
  • Rewards/margins: 0.4356
  • Logps/rejected: -1.5190
  • Logps/chosen: -1.2659
  • Logits/rejected: -1.1037
  • Logits/chosen: -1.0759

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.379 0.0523 100 1.3804 -0.0143 -0.0874 0.6448 0.0731 -1.4964 -1.2723 -1.0455 -1.0137
1.3997 0.1047 200 1.4037 -0.1231 -0.3189 0.7024 0.1958 -1.5149 -1.2810 -1.0477 -1.0177
1.4069 0.1570 300 1.4016 0.1112 -0.1817 0.7302 0.2929 -1.5039 -1.2623 -1.0539 -1.0256
1.4067 0.2094 400 1.4060 0.0174 -0.3274 0.7202 0.3448 -1.5156 -1.2698 -1.0205 -0.9955
1.4144 0.2617 500 1.3973 0.0997 -0.3029 0.7222 0.4026 -1.5136 -1.2632 -1.0800 -1.0518
1.4259 0.3141 600 1.4098 0.0202 -0.3600 0.7242 0.3802 -1.5182 -1.2695 -1.0593 -1.0335
1.3595 0.3664 700 1.4119 0.0323 -0.3666 0.7222 0.3989 -1.5187 -1.2686 -1.0663 -1.0400
1.449 0.4187 800 1.4198 -0.0062 -0.4193 0.7242 0.4130 -1.5230 -1.2716 -1.0568 -1.0320
1.4411 0.4711 900 1.4068 0.0924 -0.3174 0.75 0.4098 -1.5148 -1.2638 -1.0695 -1.0427
1.379 0.5234 1000 1.3951 0.1021 -0.3451 0.7460 0.4471 -1.5170 -1.2630 -1.0724 -1.0471
1.4269 0.5758 1100 1.4001 0.2006 -0.2040 0.7321 0.4046 -1.5057 -1.2551 -1.0807 -1.0548
1.3973 0.6281 1200 1.3843 0.0314 -0.4097 0.7421 0.4411 -1.5222 -1.2686 -1.0827 -1.0560
1.3629 0.6805 1300 1.3831 0.0455 -0.3913 0.7421 0.4367 -1.5207 -1.2675 -1.0595 -1.0347
1.3587 0.7328 1400 1.3861 0.1402 -0.2996 0.7440 0.4398 -1.5134 -1.2599 -1.0802 -1.0539
1.3972 0.7851 1500 1.3793 0.0976 -0.3469 0.7401 0.4445 -1.5172 -1.2633 -1.0829 -1.0565
1.3762 0.8375 1600 1.3783 0.0925 -0.3479 0.7480 0.4404 -1.5172 -1.2637 -1.0900 -1.0631
1.3757 0.8898 1700 1.3774 0.0540 -0.3880 0.7480 0.4420 -1.5204 -1.2668 -1.0737 -1.0482
1.3685 0.9422 1800 1.3773 0.0739 -0.3636 0.7480 0.4375 -1.5185 -1.2652 -1.0894 -1.0627
1.3649 0.9945 1900 1.3769 0.0610 -0.3706 0.7460 0.4315 -1.5191 -1.2663 -1.1038 -1.0760

Framework versions

  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta12.5-lr4e-7

Finetuned
(36)
this model

Dataset used to train aaaalongaa/llama3-8b-mypo3_sim-full-beta12.5-lr4e-7