tinyllama_moe_dpo_ultrafeedback_epochs5

This model is a fine-tuned version of ondevicellm/tinyllama_moe_sft_ultrachat_epochs3 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5698
  • Rewards/chosen: -1.5249
  • Rewards/rejected: -2.1850
  • Rewards/accuracies: 0.7460
  • Rewards/margins: 0.6601
  • Logps/rejected: -525.1185
  • Logps/chosen: -501.3176
  • Logits/rejected: -1.7144
  • Logits/chosen: -1.8206

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 96
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6916 0.1 100 0.6914 0.0044 -0.0007 0.6290 0.0050 -306.6867 -348.3935 -2.7893 -2.8529
0.6838 0.21 200 0.6833 0.0222 -0.0020 0.6548 0.0242 -306.8183 -346.6077 -2.7745 -2.8394
0.6683 0.31 300 0.6720 0.0087 -0.0449 0.6647 0.0536 -311.1083 -347.9552 -2.7457 -2.8123
0.655 0.42 400 0.6583 -0.0568 -0.1516 0.6766 0.0948 -321.7766 -354.5066 -2.6922 -2.7610
0.6435 0.52 500 0.6453 -0.1710 -0.3165 0.6706 0.1455 -338.2649 -365.9251 -2.6457 -2.7160
0.641 0.63 600 0.6366 -0.2120 -0.3985 0.6825 0.1865 -346.4684 -370.0310 -2.5893 -2.6615
0.6207 0.73 700 0.6319 -0.2615 -0.4812 0.6706 0.2197 -354.7403 -374.9797 -2.5381 -2.6120
0.6025 0.84 800 0.6249 -0.3306 -0.5888 0.6964 0.2583 -365.5025 -381.8849 -2.4644 -2.5413
0.6317 0.94 900 0.6185 -0.5597 -0.8426 0.7063 0.2829 -390.8784 -404.7987 -2.4027 -2.4814
0.6087 1.05 1000 0.6137 -0.5045 -0.8126 0.7004 0.3081 -387.8767 -399.2817 -2.3842 -2.4637
0.5993 1.15 1100 0.6077 -0.6040 -0.9415 0.7044 0.3375 -400.7663 -409.2302 -2.3436 -2.4254
0.5628 1.26 1200 0.6026 -0.8401 -1.2238 0.7103 0.3837 -429.0004 -432.8431 -2.2635 -2.3475
0.5856 1.36 1300 0.5971 -0.7421 -1.1421 0.7242 0.3999 -420.8279 -423.0439 -2.2233 -2.3091
0.5672 1.47 1400 0.5930 -0.7829 -1.2202 0.7143 0.4373 -428.6362 -427.1146 -2.1938 -2.2804
0.5536 1.57 1500 0.5872 -0.8347 -1.2945 0.7202 0.4599 -436.0717 -432.2956 -2.1433 -2.2324
0.5669 1.67 1600 0.5858 -0.7867 -1.2636 0.7163 0.4769 -432.9818 -427.4996 -2.1168 -2.2065
0.5312 1.78 1700 0.5831 -0.9925 -1.4919 0.7262 0.4994 -455.8103 -448.0764 -2.0492 -2.1424
0.5596 1.88 1800 0.5798 -1.0023 -1.5297 0.7361 0.5274 -459.5894 -449.0625 -2.0168 -2.1124
0.5489 1.99 1900 0.5813 -0.8832 -1.3904 0.7202 0.5072 -445.6621 -437.1509 -2.0039 -2.0990
0.5327 2.09 2000 0.5795 -0.9218 -1.4418 0.7242 0.5200 -450.7982 -441.0125 -1.9626 -2.0594
0.5225 2.2 2100 0.5779 -1.1696 -1.7317 0.7401 0.5621 -479.7868 -465.7886 -1.9207 -2.0189
0.5085 2.3 2200 0.5769 -1.1637 -1.7425 0.7321 0.5789 -480.8718 -465.1949 -1.8892 -1.9891
0.5255 2.41 2300 0.5770 -1.2191 -1.7925 0.7341 0.5734 -485.8650 -470.7390 -1.8632 -1.9614
0.5116 2.51 2400 0.5742 -1.1139 -1.6936 0.7381 0.5798 -475.9834 -460.2151 -1.8765 -1.9749
0.5279 2.62 2500 0.5741 -1.1556 -1.7455 0.7361 0.5899 -481.1734 -464.3928 -1.8664 -1.9651
0.4795 2.72 2600 0.5745 -1.1558 -1.7459 0.7321 0.5900 -481.2056 -464.4143 -1.8345 -1.9355
0.5217 2.83 2700 0.5699 -1.3475 -1.9659 0.7440 0.6184 -503.2092 -483.5756 -1.7956 -1.8981
0.4945 2.93 2800 0.5699 -1.3594 -1.9727 0.7381 0.6132 -503.8864 -484.7731 -1.8126 -1.9141
0.477 3.04 2900 0.5721 -1.3627 -1.9877 0.7361 0.6250 -505.3890 -485.0972 -1.7954 -1.8980
0.4754 3.14 3000 0.5729 -1.4117 -2.0575 0.7321 0.6458 -512.3726 -490.0027 -1.7473 -1.8516
0.4696 3.24 3100 0.5708 -1.5486 -2.1921 0.7282 0.6435 -525.8281 -503.6902 -1.7318 -1.8363
0.4804 3.35 3200 0.5730 -1.5037 -2.1632 0.7321 0.6595 -522.9344 -499.1950 -1.7097 -1.8163
0.483 3.45 3300 0.5706 -1.5793 -2.2451 0.7302 0.6658 -531.1252 -506.7562 -1.7082 -1.8147
0.4791 3.56 3400 0.5723 -1.4505 -2.1095 0.7262 0.6590 -517.5656 -493.8777 -1.7222 -1.8274
0.4866 3.66 3500 0.5713 -1.5091 -2.1642 0.7381 0.6551 -523.0358 -499.7364 -1.7191 -1.8243
0.4651 3.77 3600 0.5731 -1.4577 -2.1177 0.7401 0.6600 -518.3928 -494.6030 -1.7161 -1.8217
0.483 3.87 3700 0.5708 -1.4280 -2.0759 0.7361 0.6479 -514.2116 -491.6330 -1.7275 -1.8325
0.4859 3.98 3800 0.5698 -1.5249 -2.1850 0.7460 0.6601 -525.1185 -501.3176 -1.7144 -1.8206
0.476 4.08 3900 0.5701 -1.5060 -2.1668 0.7440 0.6608 -523.2975 -499.4326 -1.7157 -1.8219
0.4553 4.19 4000 0.5705 -1.5415 -2.2042 0.7361 0.6626 -527.0359 -502.9834 -1.7053 -1.8120
0.4864 4.29 4100 0.5721 -1.5310 -2.1997 0.7381 0.6687 -526.5859 -501.9312 -1.6982 -1.8054
0.4402 4.4 4200 0.5720 -1.5402 -2.2110 0.7401 0.6708 -527.7231 -502.8538 -1.6937 -1.8008
0.4619 4.5 4300 0.5712 -1.5462 -2.2169 0.7361 0.6706 -528.3046 -503.4531 -1.6931 -1.8004
0.4421 4.6 4400 0.5710 -1.5628 -2.2323 0.7381 0.6695 -529.8489 -505.1078 -1.6915 -1.7989
0.4518 4.71 4500 0.5711 -1.5704 -2.2407 0.7361 0.6703 -530.6893 -505.8743 -1.6913 -1.7985
0.4508 4.81 4600 0.5715 -1.5739 -2.2436 0.7381 0.6697 -530.9782 -506.2146 -1.6908 -1.7981
0.484 4.92 4700 0.5716 -1.5737 -2.2419 0.7321 0.6682 -530.8127 -506.2016 -1.6901 -1.7976

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
18
Safetensors
Model size
6.43B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ondevicellm/tinyllama_moe_dpo_ultrafeedback_epochs5

Finetuned
(1)
this model