Llama-3.1-8B-Instruct_dpo_sg_values_p025_OA_silver

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the dpo_sg_values_p025_OA_silver dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1344
  • Rewards/chosen: -0.3139
  • Rewards/rejected: -3.7636
  • Rewards/accuracies: 0.9440
  • Rewards/margins: 3.4497
  • Logps/chosen: -5.2304
  • Logps/rejected: -44.2580
  • Logits/chosen: -0.7692
  • Logits/rejected: -0.7949

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6882 0.1495 250 0.6470 -0.0027 -0.0991 0.8740 0.0963 -2.1189 -7.6131 -0.6586 -0.6706
0.4574 0.2990 500 0.3221 -0.1440 -1.3807 0.8900 1.2367 -3.5317 -20.4297 -0.7456 -0.7637
0.1975 0.4486 750 0.1916 -0.3061 -2.8027 0.9300 2.4966 -5.1523 -34.6491 -0.8112 -0.8331
0.128 0.5981 1000 0.1556 -0.3224 -3.2965 0.9360 2.9741 -5.3155 -39.5871 -0.7987 -0.8227
0.1217 0.7476 1250 0.1421 -0.3065 -3.5741 0.9380 3.2675 -5.1568 -42.3631 -0.7826 -0.8077
0.0864 0.8971 1500 0.1368 -0.3144 -3.7298 0.9440 3.4154 -5.2351 -43.9201 -0.7712 -0.7968

Framework versions

  • PEFT 0.15.2
  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 2.21.0
  • Tokenizers 0.21.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Incomple/Llama-3.1-8B-Instruct_dpo_sg_values_p025_OA_silver

Adapter
(956)
this model