Edit model card

v1_1000_STEPS_1e6_rate_03_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6641
  • Rewards/chosen: -1.4066
  • Rewards/rejected: -1.6576
  • Rewards/accuracies: 0.6198
  • Rewards/margins: 0.2510
  • Logps/rejected: -27.0829
  • Logps/chosen: -25.4808
  • Logits/rejected: 13.3887
  • Logits/chosen: 13.3921

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6901 0.05 50 0.6931 0.0510 0.0490 0.5253 0.0019 -21.3940 -20.6223 14.3181 14.3207
0.7257 0.1 100 0.6841 0.0934 0.0501 0.5692 0.0433 -21.3906 -20.4809 14.1613 14.1641
0.7259 0.15 150 0.6925 -0.0147 -0.0834 0.5451 0.0688 -21.8355 -20.8411 13.9200 13.9229
0.6593 0.2 200 0.7118 0.4903 0.3962 0.5802 0.0941 -20.2368 -19.1579 13.7791 13.7821
0.7282 0.24 250 0.7093 -1.2326 -1.3686 0.5648 0.1360 -26.1195 -24.9010 13.8037 13.8067
0.6924 0.29 300 0.6944 -0.7898 -0.9655 0.5626 0.1757 -24.7758 -23.4250 14.0496 14.0528
0.7523 0.34 350 0.6909 -0.9371 -1.1226 0.5626 0.1855 -25.2994 -23.9158 14.0003 14.0037
0.7276 0.39 400 0.6918 -1.8471 -2.0415 0.5868 0.1944 -28.3625 -26.9492 13.3382 13.3414
0.6255 0.44 450 0.6860 -1.5470 -1.7599 0.5934 0.2129 -27.4236 -25.9489 13.2551 13.2584
0.7342 0.49 500 0.6801 -1.5841 -1.7888 0.5758 0.2046 -27.5199 -26.0726 13.4186 13.4219
0.568 0.54 550 0.6694 -1.5101 -1.7458 0.6022 0.2356 -27.3766 -25.8260 13.5776 13.5810
0.6217 0.59 600 0.6645 -1.4050 -1.6543 0.6110 0.2492 -27.0716 -25.4756 13.6337 13.6371
0.6186 0.64 650 0.6682 -1.3826 -1.6291 0.5978 0.2465 -26.9876 -25.4007 13.4204 13.4237
0.6637 0.68 700 0.6633 -1.3994 -1.6501 0.6220 0.2507 -27.0576 -25.4569 13.4574 13.4608
0.7482 0.73 750 0.6632 -1.3772 -1.6269 0.6198 0.2497 -26.9804 -25.3829 13.4047 13.4081
0.6597 0.78 800 0.6627 -1.3970 -1.6527 0.6198 0.2557 -27.0664 -25.4489 13.3914 13.3948
0.7206 0.83 850 0.6613 -1.4018 -1.6593 0.6220 0.2575 -27.0885 -25.4648 13.3862 13.3896
0.6715 0.88 900 0.6633 -1.4047 -1.6584 0.6220 0.2537 -27.0856 -25.4746 13.3969 13.4003
0.6108 0.93 950 0.6633 -1.4042 -1.6585 0.6242 0.2543 -27.0857 -25.4727 13.3883 13.3917
0.5964 0.98 1000 0.6641 -1.4066 -1.6576 0.6198 0.2510 -27.0829 -25.4808 13.3887 13.3921

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
13
Safetensors
Model size
6.65B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tsavage68/mpt_1000_STEPS_1e6_rate_03_beta_DPO

Finetuned
(19)
this model