Visualize in Weights & Biases

zephyr-7b-nca_pair-qlora-lr5e6-beta0.1

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3408
  • Rewards/chosen: 0.1419
  • Rewards/rejected: -0.3279
  • Rewards/accuracies: 0.7450
  • Rewards/margins: 0.4698
  • Logps/rejected: -257.3817
  • Logps/chosen: -270.7577
  • Logits/rejected: -2.2321
  • Logits/chosen: -2.3119

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 5
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 40
  • total_eval_batch_size: 20
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.3652 0.0654 100 1.3585 0.0179 -0.2204 0.7200 0.2383 -256.3067 -271.9973 -2.2040 -2.2872
1.3457 0.1308 200 1.3529 0.1799 -0.2015 0.7425 0.3814 -256.1176 -270.3774 -2.2080 -2.2912
1.3328 0.1963 300 1.3500 0.1269 -0.2919 0.7150 0.4188 -257.0218 -270.9071 -2.2303 -2.3106
1.3452 0.2617 400 1.3536 0.1854 -0.2395 0.7200 0.4249 -256.4976 -270.3225 -2.2250 -2.3062
1.3446 0.3271 500 1.3501 0.0859 -0.3936 0.7275 0.4795 -258.0389 -271.3175 -2.1984 -2.2818
1.333 0.3925 600 1.3496 0.0493 -0.3851 0.7450 0.4344 -257.9544 -271.6837 -2.2107 -2.2937
1.3577 0.4580 700 1.3457 0.1306 -0.2688 0.7175 0.3994 -256.7908 -270.8706 -2.2100 -2.2934
1.343 0.5234 800 1.3449 0.0814 -0.3810 0.7150 0.4623 -257.9127 -271.3629 -2.2312 -2.3121
1.3439 0.5888 900 1.3459 0.0385 -0.4054 0.7250 0.4439 -258.1573 -271.7917 -2.2327 -2.3137
1.3388 0.6542 1000 1.3442 0.2150 -0.2625 0.7325 0.4775 -256.7277 -270.0262 -2.2387 -2.3183
1.3186 0.7197 1100 1.3423 0.1242 -0.3587 0.7325 0.4829 -257.6895 -270.9345 -2.2306 -2.3107
1.3299 0.7851 1200 1.3417 0.1468 -0.3270 0.7425 0.4737 -257.3728 -270.7089 -2.2275 -2.3078
1.3248 0.8505 1300 1.3413 0.1555 -0.3132 0.7525 0.4687 -257.2347 -270.6216 -2.2306 -2.3105
1.3398 0.9159 1400 1.3414 0.1409 -0.3251 0.7475 0.4660 -257.3535 -270.7675 -2.2317 -2.3117
1.325 0.9814 1500 1.3409 0.1433 -0.3268 0.7475 0.4701 -257.3707 -270.7436 -2.2339 -2.3137

Framework versions

  • PEFT 0.10.0
  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kimory-X/zephyr-7b-nca_pair-qlora-lr5e6-beta0.1

Adapter
(2285)
this model

Dataset used to train Kimory-X/zephyr-7b-nca_pair-qlora-lr5e6-beta0.1