zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: -0.6585
  • Rewards/chosen: -1.5839
  • Rewards/rejected: -2.6984
  • Rewards/accuracies: 0.7857
  • Rewards/margins: 1.1145
  • Logps/rejected: -530.0449
  • Logps/chosen: -440.3677
  • Logits/rejected: 1.9295
  • Logits/chosen: 0.5950

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.9314 0.1047 100 -2.4608 -2.3941 -295.3837 -296.4655 0.9248 0.7103 -0.1341 0.2286 -0.3626
0.8722 0.2093 200 -1.9828 -1.6165 -334.3411 -367.4581 0.8865 0.7520 -0.5236 0.5489 -1.0726
0.8208 0.3140 300 -0.4089 0.3061 -370.6316 -428.0927 0.8215 0.7639 -0.8865 0.7924 -1.6789
0.8208 0.4186 400 -0.4262 0.5905 -401.0516 -460.0637 0.7982 0.7718 -1.1907 0.8079 -1.9986
0.7826 0.5233 500 1.0156 2.2339 -421.7270 -504.0349 0.7799 0.7758 -1.3975 1.0408 -2.4383
0.7546 0.6279 600 0.3290 1.6798 -437.6459 -526.8406 0.7723 0.7837 -1.5567 1.1097 -2.6664
0.7533 0.7326 700 0.6982 2.0190 -444.4420 -531.2306 0.7732 0.7837 -1.6247 1.0856 -2.7103
0.7498 0.8373 800 0.4246 1.7010 -437.6152 -523.4053 0.7710 0.7857 -1.5564 1.0756 -2.6320
0.7471 0.9419 900 0.5837 1.9213 -439.8712 -529.8352 0.7707 0.7857 -1.5789 1.1174 -2.6963

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.1+cu118
  • Datasets 2.14.7
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YeongminKim/zephyr-7b-dpo-full

Finetuned
(390)
this model

Dataset used to train YeongminKim/zephyr-7b-dpo-full