Visualize in Weights & Biases

zephyr-7b-mypo3_sim-qlora-beta0.05

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3523
  • Rewards/chosen: -0.0972
  • Rewards/rejected: -0.4130
  • Rewards/accuracies: 0.7300
  • Rewards/margins: 0.3157
  • Logps/rejected: -9.3049
  • Logps/chosen: -2.8553
  • Logits/rejected: -1.0338
  • Logits/chosen: -1.3373

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.3852 0.0523 100 1.3852 -0.0054 -0.0088 0.6360 0.0034 -1.2220 -1.0192 -2.1892 -2.2997
1.3805 0.1047 200 1.3819 -0.0387 -0.0598 0.6560 0.0211 -2.2420 -1.6857 -2.0060 -2.1048
1.3788 0.1570 300 1.3812 -0.1100 -0.1667 0.6640 0.0566 -4.3795 -3.1120 -0.9278 -1.0358
1.3744 0.2094 400 1.3740 -0.0667 -0.1434 0.6900 0.0766 -3.9128 -2.2458 -0.8555 -1.0030
1.3637 0.2617 500 1.3727 -0.1457 -0.3621 0.6900 0.2164 -8.2872 -3.8245 -0.3628 -0.6041
1.3701 0.3141 600 1.3657 -0.0592 -0.1955 0.7200 0.1363 -4.9564 -2.0957 -0.7314 -0.9403
1.3626 0.3664 700 1.3622 -0.1169 -0.3824 0.6920 0.2655 -8.6929 -3.2488 -0.3517 -0.6505
1.3703 0.4187 800 1.3610 -0.0846 -0.3205 0.7080 0.2360 -7.4567 -2.6024 -0.9665 -1.2209
1.358 0.4711 900 1.3569 -0.0754 -0.3003 0.7200 0.2248 -7.0514 -2.4198 -0.9631 -1.2156
1.3607 0.5234 1000 1.3580 -0.1146 -0.4591 0.7100 0.3445 -10.2278 -3.2024 -0.8770 -1.1903
1.3578 0.5758 1100 1.3535 -0.0851 -0.3687 0.7160 0.2837 -8.4204 -2.6121 -1.0014 -1.2897
1.3507 0.6281 1200 1.3560 -0.1279 -0.4804 0.7220 0.3525 -10.6542 -3.4692 -0.9235 -1.2501
1.3557 0.6805 1300 1.3534 -0.0997 -0.4047 0.7140 0.3050 -9.1398 -2.9053 -1.0419 -1.3375
1.3353 0.7328 1400 1.3529 -0.1015 -0.4173 0.7160 0.3158 -9.3917 -2.9404 -1.0273 -1.3289
1.3492 0.7851 1500 1.3521 -0.0878 -0.3834 0.7220 0.2956 -8.7129 -2.6670 -1.0582 -1.3502
1.3546 0.8375 1600 1.3524 -0.0983 -0.4151 0.7340 0.3168 -9.3471 -2.8763 -1.0242 -1.3287
1.3474 0.8898 1700 1.3522 -0.0968 -0.4119 0.7300 0.3151 -9.2844 -2.8470 -1.0294 -1.3334
1.3512 0.9422 1800 1.3522 -0.0971 -0.4130 0.7300 0.3159 -9.3066 -2.8533 -1.0291 -1.3335
1.3648 0.9945 1900 1.3523 -0.0972 -0.4128 0.7320 0.3156 -9.3015 -2.8541 -1.0339 -1.3374

Framework versions

  • PEFT 0.10.0
  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kimory-X/zephyr-7b-mypo3_sim-qlora-beta0.05

Adapter
(2285)
this model

Dataset used to train Kimory-X/zephyr-7b-mypo3_sim-qlora-beta0.05