zephyr-7b-mypo3_sim-qlora-beta0.05

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 1.3523
Rewards/chosen: -0.0972
Rewards/rejected: -0.4130
Rewards/accuracies: 0.7300
Rewards/margins: 0.3157
Logps/rejected: -9.3049
Logps/chosen: -2.8553
Logits/rejected: -1.0338
Logits/chosen: -1.3373

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
1.3852	0.0523	100	1.3852	-0.0054	-0.0088	0.6360	0.0034	-1.2220	-1.0192	-2.1892	-2.2997
1.3805	0.1047	200	1.3819	-0.0387	-0.0598	0.6560	0.0211	-2.2420	-1.6857	-2.0060	-2.1048
1.3788	0.1570	300	1.3812	-0.1100	-0.1667	0.6640	0.0566	-4.3795	-3.1120	-0.9278	-1.0358
1.3744	0.2094	400	1.3740	-0.0667	-0.1434	0.6900	0.0766	-3.9128	-2.2458	-0.8555	-1.0030
1.3637	0.2617	500	1.3727	-0.1457	-0.3621	0.6900	0.2164	-8.2872	-3.8245	-0.3628	-0.6041
1.3701	0.3141	600	1.3657	-0.0592	-0.1955	0.7200	0.1363	-4.9564	-2.0957	-0.7314	-0.9403
1.3626	0.3664	700	1.3622	-0.1169	-0.3824	0.6920	0.2655	-8.6929	-3.2488	-0.3517	-0.6505
1.3703	0.4187	800	1.3610	-0.0846	-0.3205	0.7080	0.2360	-7.4567	-2.6024	-0.9665	-1.2209
1.358	0.4711	900	1.3569	-0.0754	-0.3003	0.7200	0.2248	-7.0514	-2.4198	-0.9631	-1.2156
1.3607	0.5234	1000	1.3580	-0.1146	-0.4591	0.7100	0.3445	-10.2278	-3.2024	-0.8770	-1.1903
1.3578	0.5758	1100	1.3535	-0.0851	-0.3687	0.7160	0.2837	-8.4204	-2.6121	-1.0014	-1.2897
1.3507	0.6281	1200	1.3560	-0.1279	-0.4804	0.7220	0.3525	-10.6542	-3.4692	-0.9235	-1.2501
1.3557	0.6805	1300	1.3534	-0.0997	-0.4047	0.7140	0.3050	-9.1398	-2.9053	-1.0419	-1.3375
1.3353	0.7328	1400	1.3529	-0.1015	-0.4173	0.7160	0.3158	-9.3917	-2.9404	-1.0273	-1.3289
1.3492	0.7851	1500	1.3521	-0.0878	-0.3834	0.7220	0.2956	-8.7129	-2.6670	-1.0582	-1.3502
1.3546	0.8375	1600	1.3524	-0.0983	-0.4151	0.7340	0.3168	-9.3471	-2.8763	-1.0242	-1.3287
1.3474	0.8898	1700	1.3522	-0.0968	-0.4119	0.7300	0.3151	-9.2844	-2.8470	-1.0294	-1.3334
1.3512	0.9422	1800	1.3522	-0.0971	-0.4130	0.7300	0.3159	-9.3066	-2.8533	-1.0291	-1.3335
1.3648	0.9945	1900	1.3523	-0.0972	-0.4128	0.7320	0.3156	-9.3015	-2.8541	-1.0339	-1.3374

Framework versions

PEFT 0.10.0
Transformers 4.43.1
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

Kimory-X
/

zephyr-7b-mypo3_sim-qlora-beta0.05

zephyr-7b-mypo3_sim-qlora-beta0.05

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Kimory-X/zephyr-7b-mypo3_sim-qlora-beta0.05

Dataset used to train Kimory-X/zephyr-7b-mypo3_sim-qlora-beta0.05

Evaluation results