tinyllama_moe_dpo_ultrafeedback_epochs5
This model is a fine-tuned version of ondevicellm/tinyllama_moe_sft_ultrachat_epochs3 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.5698
- Rewards/chosen: -1.5249
- Rewards/rejected: -2.1850
- Rewards/accuracies: 0.7460
- Rewards/margins: 0.6601
- Logps/rejected: -525.1185
- Logps/chosen: -501.3176
- Logits/rejected: -1.7144
- Logits/chosen: -1.8206
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 96
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6916 | 0.1 | 100 | 0.6914 | 0.0044 | -0.0007 | 0.6290 | 0.0050 | -306.6867 | -348.3935 | -2.7893 | -2.8529 |
0.6838 | 0.21 | 200 | 0.6833 | 0.0222 | -0.0020 | 0.6548 | 0.0242 | -306.8183 | -346.6077 | -2.7745 | -2.8394 |
0.6683 | 0.31 | 300 | 0.6720 | 0.0087 | -0.0449 | 0.6647 | 0.0536 | -311.1083 | -347.9552 | -2.7457 | -2.8123 |
0.655 | 0.42 | 400 | 0.6583 | -0.0568 | -0.1516 | 0.6766 | 0.0948 | -321.7766 | -354.5066 | -2.6922 | -2.7610 |
0.6435 | 0.52 | 500 | 0.6453 | -0.1710 | -0.3165 | 0.6706 | 0.1455 | -338.2649 | -365.9251 | -2.6457 | -2.7160 |
0.641 | 0.63 | 600 | 0.6366 | -0.2120 | -0.3985 | 0.6825 | 0.1865 | -346.4684 | -370.0310 | -2.5893 | -2.6615 |
0.6207 | 0.73 | 700 | 0.6319 | -0.2615 | -0.4812 | 0.6706 | 0.2197 | -354.7403 | -374.9797 | -2.5381 | -2.6120 |
0.6025 | 0.84 | 800 | 0.6249 | -0.3306 | -0.5888 | 0.6964 | 0.2583 | -365.5025 | -381.8849 | -2.4644 | -2.5413 |
0.6317 | 0.94 | 900 | 0.6185 | -0.5597 | -0.8426 | 0.7063 | 0.2829 | -390.8784 | -404.7987 | -2.4027 | -2.4814 |
0.6087 | 1.05 | 1000 | 0.6137 | -0.5045 | -0.8126 | 0.7004 | 0.3081 | -387.8767 | -399.2817 | -2.3842 | -2.4637 |
0.5993 | 1.15 | 1100 | 0.6077 | -0.6040 | -0.9415 | 0.7044 | 0.3375 | -400.7663 | -409.2302 | -2.3436 | -2.4254 |
0.5628 | 1.26 | 1200 | 0.6026 | -0.8401 | -1.2238 | 0.7103 | 0.3837 | -429.0004 | -432.8431 | -2.2635 | -2.3475 |
0.5856 | 1.36 | 1300 | 0.5971 | -0.7421 | -1.1421 | 0.7242 | 0.3999 | -420.8279 | -423.0439 | -2.2233 | -2.3091 |
0.5672 | 1.47 | 1400 | 0.5930 | -0.7829 | -1.2202 | 0.7143 | 0.4373 | -428.6362 | -427.1146 | -2.1938 | -2.2804 |
0.5536 | 1.57 | 1500 | 0.5872 | -0.8347 | -1.2945 | 0.7202 | 0.4599 | -436.0717 | -432.2956 | -2.1433 | -2.2324 |
0.5669 | 1.67 | 1600 | 0.5858 | -0.7867 | -1.2636 | 0.7163 | 0.4769 | -432.9818 | -427.4996 | -2.1168 | -2.2065 |
0.5312 | 1.78 | 1700 | 0.5831 | -0.9925 | -1.4919 | 0.7262 | 0.4994 | -455.8103 | -448.0764 | -2.0492 | -2.1424 |
0.5596 | 1.88 | 1800 | 0.5798 | -1.0023 | -1.5297 | 0.7361 | 0.5274 | -459.5894 | -449.0625 | -2.0168 | -2.1124 |
0.5489 | 1.99 | 1900 | 0.5813 | -0.8832 | -1.3904 | 0.7202 | 0.5072 | -445.6621 | -437.1509 | -2.0039 | -2.0990 |
0.5327 | 2.09 | 2000 | 0.5795 | -0.9218 | -1.4418 | 0.7242 | 0.5200 | -450.7982 | -441.0125 | -1.9626 | -2.0594 |
0.5225 | 2.2 | 2100 | 0.5779 | -1.1696 | -1.7317 | 0.7401 | 0.5621 | -479.7868 | -465.7886 | -1.9207 | -2.0189 |
0.5085 | 2.3 | 2200 | 0.5769 | -1.1637 | -1.7425 | 0.7321 | 0.5789 | -480.8718 | -465.1949 | -1.8892 | -1.9891 |
0.5255 | 2.41 | 2300 | 0.5770 | -1.2191 | -1.7925 | 0.7341 | 0.5734 | -485.8650 | -470.7390 | -1.8632 | -1.9614 |
0.5116 | 2.51 | 2400 | 0.5742 | -1.1139 | -1.6936 | 0.7381 | 0.5798 | -475.9834 | -460.2151 | -1.8765 | -1.9749 |
0.5279 | 2.62 | 2500 | 0.5741 | -1.1556 | -1.7455 | 0.7361 | 0.5899 | -481.1734 | -464.3928 | -1.8664 | -1.9651 |
0.4795 | 2.72 | 2600 | 0.5745 | -1.1558 | -1.7459 | 0.7321 | 0.5900 | -481.2056 | -464.4143 | -1.8345 | -1.9355 |
0.5217 | 2.83 | 2700 | 0.5699 | -1.3475 | -1.9659 | 0.7440 | 0.6184 | -503.2092 | -483.5756 | -1.7956 | -1.8981 |
0.4945 | 2.93 | 2800 | 0.5699 | -1.3594 | -1.9727 | 0.7381 | 0.6132 | -503.8864 | -484.7731 | -1.8126 | -1.9141 |
0.477 | 3.04 | 2900 | 0.5721 | -1.3627 | -1.9877 | 0.7361 | 0.6250 | -505.3890 | -485.0972 | -1.7954 | -1.8980 |
0.4754 | 3.14 | 3000 | 0.5729 | -1.4117 | -2.0575 | 0.7321 | 0.6458 | -512.3726 | -490.0027 | -1.7473 | -1.8516 |
0.4696 | 3.24 | 3100 | 0.5708 | -1.5486 | -2.1921 | 0.7282 | 0.6435 | -525.8281 | -503.6902 | -1.7318 | -1.8363 |
0.4804 | 3.35 | 3200 | 0.5730 | -1.5037 | -2.1632 | 0.7321 | 0.6595 | -522.9344 | -499.1950 | -1.7097 | -1.8163 |
0.483 | 3.45 | 3300 | 0.5706 | -1.5793 | -2.2451 | 0.7302 | 0.6658 | -531.1252 | -506.7562 | -1.7082 | -1.8147 |
0.4791 | 3.56 | 3400 | 0.5723 | -1.4505 | -2.1095 | 0.7262 | 0.6590 | -517.5656 | -493.8777 | -1.7222 | -1.8274 |
0.4866 | 3.66 | 3500 | 0.5713 | -1.5091 | -2.1642 | 0.7381 | 0.6551 | -523.0358 | -499.7364 | -1.7191 | -1.8243 |
0.4651 | 3.77 | 3600 | 0.5731 | -1.4577 | -2.1177 | 0.7401 | 0.6600 | -518.3928 | -494.6030 | -1.7161 | -1.8217 |
0.483 | 3.87 | 3700 | 0.5708 | -1.4280 | -2.0759 | 0.7361 | 0.6479 | -514.2116 | -491.6330 | -1.7275 | -1.8325 |
0.4859 | 3.98 | 3800 | 0.5698 | -1.5249 | -2.1850 | 0.7460 | 0.6601 | -525.1185 | -501.3176 | -1.7144 | -1.8206 |
0.476 | 4.08 | 3900 | 0.5701 | -1.5060 | -2.1668 | 0.7440 | 0.6608 | -523.2975 | -499.4326 | -1.7157 | -1.8219 |
0.4553 | 4.19 | 4000 | 0.5705 | -1.5415 | -2.2042 | 0.7361 | 0.6626 | -527.0359 | -502.9834 | -1.7053 | -1.8120 |
0.4864 | 4.29 | 4100 | 0.5721 | -1.5310 | -2.1997 | 0.7381 | 0.6687 | -526.5859 | -501.9312 | -1.6982 | -1.8054 |
0.4402 | 4.4 | 4200 | 0.5720 | -1.5402 | -2.2110 | 0.7401 | 0.6708 | -527.7231 | -502.8538 | -1.6937 | -1.8008 |
0.4619 | 4.5 | 4300 | 0.5712 | -1.5462 | -2.2169 | 0.7361 | 0.6706 | -528.3046 | -503.4531 | -1.6931 | -1.8004 |
0.4421 | 4.6 | 4400 | 0.5710 | -1.5628 | -2.2323 | 0.7381 | 0.6695 | -529.8489 | -505.1078 | -1.6915 | -1.7989 |
0.4518 | 4.71 | 4500 | 0.5711 | -1.5704 | -2.2407 | 0.7361 | 0.6703 | -530.6893 | -505.8743 | -1.6913 | -1.7985 |
0.4508 | 4.81 | 4600 | 0.5715 | -1.5739 | -2.2436 | 0.7381 | 0.6697 | -530.9782 | -506.2146 | -1.6908 | -1.7981 |
0.484 | 4.92 | 4700 | 0.5716 | -1.5737 | -2.2419 | 0.7321 | 0.6682 | -530.8127 | -506.2016 | -1.6901 | -1.7976 |
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.2+cu118
- Datasets 2.14.6
- Tokenizers 0.15.0
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support