Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4437
  • Rewards/chosen: -1.5898
  • Rewards/rejected: -2.7509
  • Rewards/accuracies: 0.7000
  • Rewards/margins: 1.1611
  • Logps/rejected: -114.1047
  • Logps/chosen: -92.5540
  • Logits/rejected: -0.0729
  • Logits/chosen: -0.0526

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6702 0.2993 66 0.6613 0.0837 -0.0035 0.7000 0.0872 -86.6308 -75.8190 0.3314 0.3469
0.686 0.5986 132 0.5646 0.0172 -0.3322 0.8000 0.3494 -89.9173 -76.4838 0.3494 0.3651
0.7758 0.8980 198 0.5747 0.0543 -0.2153 0.9000 0.2696 -88.7488 -76.1133 0.3694 0.3845
0.6695 1.1973 264 0.5693 -0.2661 -0.6699 0.7000 0.4038 -93.2946 -79.3173 0.3321 0.3466
0.5453 1.4966 330 0.5472 -0.6038 -1.1332 0.6000 0.5294 -97.9278 -82.6945 0.2266 0.2424
0.5922 1.7959 396 0.5142 -0.9005 -1.6462 0.6000 0.7457 -103.0579 -85.6614 0.1303 0.1477
0.2128 2.0952 462 0.4825 -1.1082 -1.9752 0.8000 0.8670 -106.3474 -87.7384 0.0713 0.0898
0.1372 2.3946 528 0.4425 -1.4160 -2.5347 0.8000 1.1187 -111.9428 -90.8164 -0.0224 -0.0028
0.3622 2.6939 594 0.4437 -1.5113 -2.6570 0.8000 1.1457 -113.1660 -91.7698 -0.0636 -0.0435
0.1555 2.9932 660 0.4437 -1.5898 -2.7509 0.7000 1.1611 -114.1047 -92.5540 -0.0729 -0.0526

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3
Downloads last month
20
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V7

Adapter
(1771)
this model