Llama-3.1-8B-Instruct_dpo_sg_values_p025_OA_gold

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the dpo_sg_values_p025_OA_gold dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1369
  • Rewards/chosen: -0.3243
  • Rewards/rejected: -3.7445
  • Rewards/accuracies: 0.9400
  • Rewards/margins: 3.4202
  • Logps/chosen: -5.3343
  • Logps/rejected: -44.0676
  • Logits/chosen: -0.8235
  • Logits/rejected: -0.8529

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.677 0.1495 250 0.6611 -0.0030 -0.0688 0.8780 0.0658 -2.1210 -7.3104 -0.6673 -0.6792
0.4141 0.2990 500 0.3328 -0.1370 -1.2934 0.8960 1.1564 -3.4617 -19.5564 -0.7405 -0.7604
0.1869 0.4486 750 0.1943 -0.3266 -2.7983 0.9280 2.4718 -5.3572 -34.6058 -0.8272 -0.8525
0.1234 0.5981 1000 0.1579 -0.3430 -3.2984 0.9380 2.9554 -5.5213 -39.6065 -0.8336 -0.8610
0.122 0.7476 1250 0.1439 -0.3187 -3.5609 0.9360 3.2422 -5.2784 -42.2310 -0.8265 -0.8553
0.0821 0.8971 1500 0.1398 -0.3263 -3.7101 0.9340 3.3838 -5.3544 -43.7234 -0.8241 -0.8535

Framework versions

  • PEFT 0.15.2
  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 2.21.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Incomple/Llama-3.1-8B-Instruct_dpo_sg_values_p025_OA_gold

Adapter
(956)
this model