zephyr-7b-dpo-full-alpha_0.5_batch64_0.03

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.7789
Rewards/chosen: -1.3273
Rewards/rejected: -2.3422
Rewards/accuracies: 0.7857
Rewards/margins: 1.0149
Logps/rejected: -494.4228
Logps/chosen: -414.7076
Logits/rejected: 0.3577
Logits/chosen: -0.7982

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.9317	0.1047	100	0.9273	-0.0556	-0.2665	0.7163	0.2109	-286.8501	-287.5347	-2.3773	-2.4470
0.8668	0.2093	200	0.8656	-0.9311	-1.6048	0.7440	0.6737	-420.6816	-375.0894	-1.2855	-1.6337
0.8236	0.3140	300	0.8208	-1.0963	-1.9420	0.7758	0.8457	-454.4009	-391.6070	-0.2995	-1.0322
0.8334	0.4186	400	0.8072	-0.9992	-1.7408	0.7679	0.7417	-434.2870	-381.8950	-0.8747	-1.6086
0.7792	0.5233	500	0.7923	-1.4240	-2.3792	0.7798	0.9552	-498.1183	-424.3773	0.0124	-1.0241
0.7564	0.6279	600	0.7844	-1.2734	-2.2487	0.7679	0.9753	-485.0775	-409.3177	-0.0792	-1.0851
0.7475	0.7326	700	0.7819	-1.2863	-2.2716	0.7857	0.9853	-487.3649	-410.6092	0.1648	-0.9343
0.7483	0.8373	800	0.7792	-1.2626	-2.2457	0.7877	0.9831	-484.7710	-408.2386	0.1571	-0.9385
0.7578	0.9419	900	0.7788	-1.3317	-2.3491	0.7857	1.0174	-495.1149	-415.1518	0.3728	-0.7858

Framework versions

Transformers 4.44.2
Pytorch 2.2.1+cu118
Datasets 2.14.7
Tokenizers 0.19.1

YeongminKim
/

zephyr-7b-dpo-full-alpha_0.5_batch64_0.03

zephyr-7b-dpo-full-alpha_0.5_batch64_0.03

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch64_0.03

Dataset used to train YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch64_0.03

Evaluation results