tinyllama_moe_dpo_ultrafeedback_epochs5

This model is a fine-tuned version of ondevicellm/tinyllama_moe_sft_ultrachat_epochs3 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5698
Rewards/chosen: -1.5249
Rewards/rejected: -2.1850
Rewards/accuracies: 0.7460
Rewards/margins: 0.6601
Logps/rejected: -525.1185
Logps/chosen: -501.3176
Logits/rejected: -1.7144
Logits/chosen: -1.8206

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 96
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6916	0.1	100	0.6914	0.0044	-0.0007	0.6290	0.0050	-306.6867	-348.3935	-2.7893	-2.8529
0.6838	0.21	200	0.6833	0.0222	-0.0020	0.6548	0.0242	-306.8183	-346.6077	-2.7745	-2.8394
0.6683	0.31	300	0.6720	0.0087	-0.0449	0.6647	0.0536	-311.1083	-347.9552	-2.7457	-2.8123
0.655	0.42	400	0.6583	-0.0568	-0.1516	0.6766	0.0948	-321.7766	-354.5066	-2.6922	-2.7610
0.6435	0.52	500	0.6453	-0.1710	-0.3165	0.6706	0.1455	-338.2649	-365.9251	-2.6457	-2.7160
0.641	0.63	600	0.6366	-0.2120	-0.3985	0.6825	0.1865	-346.4684	-370.0310	-2.5893	-2.6615
0.6207	0.73	700	0.6319	-0.2615	-0.4812	0.6706	0.2197	-354.7403	-374.9797	-2.5381	-2.6120
0.6025	0.84	800	0.6249	-0.3306	-0.5888	0.6964	0.2583	-365.5025	-381.8849	-2.4644	-2.5413
0.6317	0.94	900	0.6185	-0.5597	-0.8426	0.7063	0.2829	-390.8784	-404.7987	-2.4027	-2.4814
0.6087	1.05	1000	0.6137	-0.5045	-0.8126	0.7004	0.3081	-387.8767	-399.2817	-2.3842	-2.4637
0.5993	1.15	1100	0.6077	-0.6040	-0.9415	0.7044	0.3375	-400.7663	-409.2302	-2.3436	-2.4254
0.5628	1.26	1200	0.6026	-0.8401	-1.2238	0.7103	0.3837	-429.0004	-432.8431	-2.2635	-2.3475
0.5856	1.36	1300	0.5971	-0.7421	-1.1421	0.7242	0.3999	-420.8279	-423.0439	-2.2233	-2.3091
0.5672	1.47	1400	0.5930	-0.7829	-1.2202	0.7143	0.4373	-428.6362	-427.1146	-2.1938	-2.2804
0.5536	1.57	1500	0.5872	-0.8347	-1.2945	0.7202	0.4599	-436.0717	-432.2956	-2.1433	-2.2324
0.5669	1.67	1600	0.5858	-0.7867	-1.2636	0.7163	0.4769	-432.9818	-427.4996	-2.1168	-2.2065
0.5312	1.78	1700	0.5831	-0.9925	-1.4919	0.7262	0.4994	-455.8103	-448.0764	-2.0492	-2.1424
0.5596	1.88	1800	0.5798	-1.0023	-1.5297	0.7361	0.5274	-459.5894	-449.0625	-2.0168	-2.1124
0.5489	1.99	1900	0.5813	-0.8832	-1.3904	0.7202	0.5072	-445.6621	-437.1509	-2.0039	-2.0990
0.5327	2.09	2000	0.5795	-0.9218	-1.4418	0.7242	0.5200	-450.7982	-441.0125	-1.9626	-2.0594
0.5225	2.2	2100	0.5779	-1.1696	-1.7317	0.7401	0.5621	-479.7868	-465.7886	-1.9207	-2.0189
0.5085	2.3	2200	0.5769	-1.1637	-1.7425	0.7321	0.5789	-480.8718	-465.1949	-1.8892	-1.9891
0.5255	2.41	2300	0.5770	-1.2191	-1.7925	0.7341	0.5734	-485.8650	-470.7390	-1.8632	-1.9614
0.5116	2.51	2400	0.5742	-1.1139	-1.6936	0.7381	0.5798	-475.9834	-460.2151	-1.8765	-1.9749
0.5279	2.62	2500	0.5741	-1.1556	-1.7455	0.7361	0.5899	-481.1734	-464.3928	-1.8664	-1.9651
0.4795	2.72	2600	0.5745	-1.1558	-1.7459	0.7321	0.5900	-481.2056	-464.4143	-1.8345	-1.9355
0.5217	2.83	2700	0.5699	-1.3475	-1.9659	0.7440	0.6184	-503.2092	-483.5756	-1.7956	-1.8981
0.4945	2.93	2800	0.5699	-1.3594	-1.9727	0.7381	0.6132	-503.8864	-484.7731	-1.8126	-1.9141
0.477	3.04	2900	0.5721	-1.3627	-1.9877	0.7361	0.6250	-505.3890	-485.0972	-1.7954	-1.8980
0.4754	3.14	3000	0.5729	-1.4117	-2.0575	0.7321	0.6458	-512.3726	-490.0027	-1.7473	-1.8516
0.4696	3.24	3100	0.5708	-1.5486	-2.1921	0.7282	0.6435	-525.8281	-503.6902	-1.7318	-1.8363
0.4804	3.35	3200	0.5730	-1.5037	-2.1632	0.7321	0.6595	-522.9344	-499.1950	-1.7097	-1.8163
0.483	3.45	3300	0.5706	-1.5793	-2.2451	0.7302	0.6658	-531.1252	-506.7562	-1.7082	-1.8147
0.4791	3.56	3400	0.5723	-1.4505	-2.1095	0.7262	0.6590	-517.5656	-493.8777	-1.7222	-1.8274
0.4866	3.66	3500	0.5713	-1.5091	-2.1642	0.7381	0.6551	-523.0358	-499.7364	-1.7191	-1.8243
0.4651	3.77	3600	0.5731	-1.4577	-2.1177	0.7401	0.6600	-518.3928	-494.6030	-1.7161	-1.8217
0.483	3.87	3700	0.5708	-1.4280	-2.0759	0.7361	0.6479	-514.2116	-491.6330	-1.7275	-1.8325
0.4859	3.98	3800	0.5698	-1.5249	-2.1850	0.7460	0.6601	-525.1185	-501.3176	-1.7144	-1.8206
0.476	4.08	3900	0.5701	-1.5060	-2.1668	0.7440	0.6608	-523.2975	-499.4326	-1.7157	-1.8219
0.4553	4.19	4000	0.5705	-1.5415	-2.2042	0.7361	0.6626	-527.0359	-502.9834	-1.7053	-1.8120
0.4864	4.29	4100	0.5721	-1.5310	-2.1997	0.7381	0.6687	-526.5859	-501.9312	-1.6982	-1.8054
0.4402	4.4	4200	0.5720	-1.5402	-2.2110	0.7401	0.6708	-527.7231	-502.8538	-1.6937	-1.8008
0.4619	4.5	4300	0.5712	-1.5462	-2.2169	0.7361	0.6706	-528.3046	-503.4531	-1.6931	-1.8004
0.4421	4.6	4400	0.5710	-1.5628	-2.2323	0.7381	0.6695	-529.8489	-505.1078	-1.6915	-1.7989
0.4518	4.71	4500	0.5711	-1.5704	-2.2407	0.7361	0.6703	-530.6893	-505.8743	-1.6913	-1.7985
0.4508	4.81	4600	0.5715	-1.5739	-2.2436	0.7381	0.6697	-530.9782	-506.2146	-1.6908	-1.7981
0.484	4.92	4700	0.5716	-1.5737	-2.2419	0.7321	0.6682	-530.8127	-506.2016	-1.6901	-1.7976

Framework versions

Transformers 4.36.2
Pytorch 2.1.2+cu118
Datasets 2.14.6
Tokenizers 0.15.0

ondevicellm
/

tinyllama_moe_dpo_ultrafeedback_epochs5

tinyllama_moe_dpo_ultrafeedback_epochs5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ondevicellm/tinyllama_moe_dpo_ultrafeedback_epochs5

Evaluation results