gemma-2b-lora-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of google/gemma-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4752
Rewards/chosen: -0.2074
Rewards/rejected: -2.7558
Rewards/accuracies: 0.8491
Rewards/margins: 2.5483
Logps/rejected: -309.6141
Logps/chosen: -258.4032
Logits/rejected: -29.9596
Logits/chosen: -27.7808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 250
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6326	0.19	250	0.5095	-0.4198	-1.0814	0.8348	0.6616	-292.8703	-260.5270	-29.7690	-27.6954
0.4753	0.39	500	0.4478	-0.4809	-1.9324	0.8507	1.4515	-301.3807	-261.1383	-29.6874	-27.5844
0.4466	0.58	750	0.4318	-0.1824	-1.8487	0.8503	1.6663	-300.5433	-258.1532	-29.6629	-27.5793
0.4287	0.78	1000	0.4400	-0.1281	-2.0702	0.8507	1.9420	-302.7580	-257.6101	-30.0317	-27.8922
0.4417	0.97	1250	0.4321	0.1125	-1.7668	0.8495	1.8792	-299.7242	-255.2044	-30.0155	-27.875
0.4085	1.17	1500	0.4355	-0.1108	-2.1492	0.8511	2.0384	-303.5482	-257.4367	-29.9166	-27.7871
0.3946	1.36	1750	0.4488	-0.1271	-2.3911	0.8519	2.2640	-305.9676	-257.6003	-29.8426	-27.7085
0.3982	1.56	2000	0.4362	-0.0692	-2.2448	0.8515	2.1756	-304.5043	-257.0213	-30.1425	-27.9918
0.3943	1.75	2250	0.4453	0.0607	-2.1390	0.8491	2.1997	-303.4470	-255.7220	-30.2768	-28.1039
0.3741	1.94	2500	0.4273	0.0867	-2.0180	0.8507	2.1047	-302.2360	-255.4620	-30.1318	-27.9690
0.3321	2.14	2750	0.4565	-0.0808	-2.5300	0.8499	2.4492	-307.3560	-257.1368	-30.0401	-27.8877
0.3323	2.33	3000	0.4463	0.0323	-2.2984	0.8503	2.3307	-305.0405	-256.0064	-30.1648	-27.9869
0.3495	2.53	3250	0.4299	0.1988	-1.8994	0.8511	2.0982	-301.0504	-254.3410	-30.2768	-28.0945
0.3423	2.72	3500	0.4385	0.0237	-2.1481	0.8499	2.1718	-303.5371	-256.0920	-30.1685	-27.9889
0.334	2.92	3750	0.4356	0.0467	-2.1581	0.8499	2.2047	-303.6373	-255.8624	-30.1857	-27.9928
0.2933	3.11	4000	0.4540	0.0275	-2.4119	0.8503	2.4394	-306.1758	-256.0542	-30.1524	-27.9559
0.3138	3.3	4250	0.4487	-0.0797	-2.4315	0.8499	2.3517	-306.3710	-257.1263	-30.0450	-27.8772
0.28	3.5	4500	0.4696	-0.2282	-2.8278	0.8519	2.5996	-310.3340	-258.6105	-30.0594	-27.8809
0.2796	3.69	4750	0.4545	-0.0877	-2.5133	0.8499	2.4256	-307.1899	-257.2065	-30.0334	-27.8598
0.2859	3.89	5000	0.4540	-0.1038	-2.5361	0.8507	2.4323	-307.4171	-257.3667	-29.9932	-27.8206
0.2785	4.08	5250	0.4619	-0.1923	-2.7125	0.8488	2.5202	-309.1819	-258.2524	-29.9455	-27.7723
0.2751	4.28	5500	0.4614	-0.1893	-2.7226	0.8488	2.5333	-309.2824	-258.2219	-29.9548	-27.7857
0.2522	4.47	5750	0.4606	-0.1197	-2.5970	0.8507	2.4773	-308.0268	-257.5265	-30.0076	-27.8263
0.2497	4.67	6000	0.4674	-0.1855	-2.7709	0.8503	2.5854	-309.7651	-258.1835	-29.9580	-27.7820
0.2634	4.86	6250	0.4752	-0.2074	-2.7558	0.8491	2.5483	-309.6141	-258.4032	-29.9596	-27.7808

Framework versions

PEFT 0.8.2
Transformers 4.38.0
Pytorch 2.1.0+cu121
Datasets 2.17.0
Tokenizers 0.15.2

glenn2
/

gemma-2b-lora-distilabel-intel-orca-dpo-pairs

gemma-2b-lora-distilabel-intel-orca-dpo-pairs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for glenn2/gemma-2b-lora-distilabel-intel-orca-dpo-pairs

Evaluation results