tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.6501
Rewards/chosen: -1.0591
Rewards/rejected: -1.2329
Rewards/accuracies: 0.6032
Rewards/margins: 0.1739
Logps/rejected: -186.0431
Logps/chosen: -164.9210
Logits/rejected: -2.3430
Logits/chosen: -2.3551

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.693	0.0689	400	0.6931	0.0003	0.0002	0.5112	0.0001	-62.7270	-58.9858	-2.9691	-2.9727
0.6923	0.1378	800	0.6926	0.0024	0.0012	0.5493	0.0011	-62.6258	-58.7797	-2.9667	-2.9701
0.6901	0.2068	1200	0.6907	-0.0080	-0.0133	0.5697	0.0053	-64.0827	-59.8146	-2.9579	-2.9613
0.6835	0.2757	1600	0.6880	-0.0321	-0.0436	0.5764	0.0114	-67.1050	-62.2266	-2.9410	-2.9442
0.6865	0.3446	2000	0.6852	-0.0690	-0.0874	0.5713	0.0184	-71.4878	-65.9158	-2.9158	-2.9192
0.6767	0.4135	2400	0.6817	-0.1086	-0.1352	0.5816	0.0265	-76.2651	-69.8803	-2.8906	-2.8938
0.6726	0.4824	2800	0.6792	-0.1614	-0.1943	0.5767	0.0328	-82.1753	-75.1597	-2.8617	-2.8651
0.6643	0.5513	3200	0.6729	-0.2581	-0.3074	0.5948	0.0493	-93.4915	-84.8225	-2.8387	-2.8420
0.6614	0.6203	3600	0.6740	-0.2589	-0.3059	0.5904	0.0470	-93.3416	-84.9094	-2.8113	-2.8144
0.6609	0.6892	4000	0.6696	-0.3009	-0.3603	0.6053	0.0594	-98.7785	-89.1073	-2.7879	-2.7912
0.6562	0.7581	4400	0.6667	-0.4072	-0.4790	0.5983	0.0718	-110.6499	-99.7330	-2.7515	-2.7548
0.6569	0.8270	4800	0.6637	-0.4951	-0.5782	0.6059	0.0831	-120.5742	-108.5273	-2.7283	-2.7316
0.6383	0.8959	5200	0.6621	-0.5180	-0.6112	0.6055	0.0932	-123.8654	-110.8119	-2.7112	-2.7149
0.6411	0.9649	5600	0.6623	-0.5228	-0.6134	0.6055	0.0906	-124.0929	-111.2965	-2.6869	-2.6910
0.6293	1.0338	6000	0.6618	-0.6210	-0.7260	0.6064	0.1049	-135.3463	-121.1192	-2.6526	-2.6573
0.6247	1.1027	6400	0.6587	-0.7088	-0.8268	0.5990	0.1180	-145.4310	-129.8984	-2.6201	-2.6254
0.6194	1.1716	6800	0.6580	-0.7955	-0.9191	0.5980	0.1236	-154.6599	-138.5692	-2.5858	-2.5912
0.6127	1.2405	7200	0.6558	-0.6612	-0.7815	0.6039	0.1203	-140.8955	-125.1357	-2.5822	-2.5877
0.6531	1.3094	7600	0.6534	-0.7460	-0.8804	0.6041	0.1344	-150.7862	-133.6133	-2.5502	-2.5564
0.5995	1.3784	8000	0.6528	-0.8128	-0.9555	0.6006	0.1427	-158.2948	-140.2942	-2.5195	-2.5267
0.61	1.4473	8400	0.6540	-0.7310	-0.8603	0.5980	0.1293	-148.7821	-132.1185	-2.5198	-2.5268
0.6575	1.5162	8800	0.6527	-0.8369	-0.9764	0.5997	0.1395	-160.3900	-142.7025	-2.4947	-2.5022
0.5969	1.5851	9200	0.6516	-0.8922	-1.0366	0.6101	0.1444	-166.4089	-148.2315	-2.4661	-2.4746
0.6211	1.6540	9600	0.6526	-0.7875	-0.9248	0.6094	0.1373	-155.2340	-137.7698	-2.4725	-2.4804
0.6011	1.7229	10000	0.6517	-0.8912	-1.0379	0.6099	0.1467	-166.5410	-148.1359	-2.4396	-2.4489
0.571	1.7919	10400	0.6514	-0.8234	-0.9653	0.6122	0.1419	-159.2782	-141.3557	-2.4401	-2.4489
0.5889	1.8608	10800	0.6506	-1.0172	-1.1751	0.6055	0.1579	-180.2568	-160.7332	-2.3932	-2.4039
0.5685	1.9297	11200	0.6486	-1.0256	-1.1907	0.5992	0.1651	-181.8200	-161.5783	-2.3887	-2.3992
0.63	1.9986	11600	0.6502	-0.8869	-1.0380	0.6004	0.1511	-166.5461	-147.7054	-2.4012	-2.4108
0.5891	2.0675	12000	0.6490	-1.0453	-1.2122	0.6046	0.1670	-183.9714	-163.5418	-2.3713	-2.3825
0.5808	2.1365	12400	0.6490	-1.1906	-1.3718	0.6039	0.1811	-199.9255	-178.0778	-2.3382	-2.3508
0.6051	2.2054	12800	0.6496	-1.0959	-1.2648	0.6053	0.1689	-189.2301	-168.6040	-2.3542	-2.3658
0.6223	2.2743	13200	0.6502	-1.0865	-1.2588	0.6069	0.1723	-188.6267	-167.6660	-2.3460	-2.3579
0.6245	2.3432	13600	0.6506	-1.0806	-1.2530	0.5983	0.1724	-188.0497	-167.0715	-2.3462	-2.3583
0.5716	2.4121	14000	0.6511	-1.0306	-1.1979	0.5941	0.1672	-182.5368	-162.0786	-2.3533	-2.3651
0.6078	2.4810	14400	0.6506	-1.0889	-1.2642	0.6004	0.1753	-189.1684	-167.9059	-2.3417	-2.3540
0.6112	2.5500	14800	0.6500	-1.1067	-1.2865	0.5971	0.1798	-191.4036	-169.6898	-2.3390	-2.3514
0.5773	2.6189	15200	0.6508	-1.0435	-1.2146	0.6025	0.1712	-184.2123	-163.3605	-2.3468	-2.3588
0.5983	2.6878	15600	0.6505	-1.0660	-1.2397	0.6018	0.1737	-186.7185	-165.6157	-2.3419	-2.3540
0.5983	2.7567	16000	0.6501	-1.0707	-1.2465	0.6029	0.1758	-187.3989	-166.0839	-2.3408	-2.3530
0.5956	2.8256	16400	0.6500	-1.0594	-1.2333	0.6008	0.1739	-186.0803	-164.9520	-2.3429	-2.3550
0.6221	2.8946	16800	0.6499	-1.0592	-1.2333	0.6041	0.1742	-186.0846	-164.9336	-2.3430	-2.3551
0.6096	2.9635	17200	0.6500	-1.0595	-1.2334	0.6046	0.1739	-186.0905	-164.9614	-2.3429	-2.3549

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.19.2
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs

tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for martimfasantos/tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR1e-7_3epochs

Evaluation results