zephyr-7b-nca_pair-qlora-lr5e6-beta0.1

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 1.3408
Rewards/chosen: 0.1419
Rewards/rejected: -0.3279
Rewards/accuracies: 0.7450
Rewards/margins: 0.4698
Logps/rejected: -257.3817
Logps/chosen: -270.7577
Logits/rejected: -2.2321
Logits/chosen: -2.3119

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 5
gradient_accumulation_steps: 4
total_train_batch_size: 40
total_eval_batch_size: 20
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
1.3652	0.0654	100	1.3585	0.0179	-0.2204	0.7200	0.2383	-256.3067	-271.9973	-2.2040	-2.2872
1.3457	0.1308	200	1.3529	0.1799	-0.2015	0.7425	0.3814	-256.1176	-270.3774	-2.2080	-2.2912
1.3328	0.1963	300	1.3500	0.1269	-0.2919	0.7150	0.4188	-257.0218	-270.9071	-2.2303	-2.3106
1.3452	0.2617	400	1.3536	0.1854	-0.2395	0.7200	0.4249	-256.4976	-270.3225	-2.2250	-2.3062
1.3446	0.3271	500	1.3501	0.0859	-0.3936	0.7275	0.4795	-258.0389	-271.3175	-2.1984	-2.2818
1.333	0.3925	600	1.3496	0.0493	-0.3851	0.7450	0.4344	-257.9544	-271.6837	-2.2107	-2.2937
1.3577	0.4580	700	1.3457	0.1306	-0.2688	0.7175	0.3994	-256.7908	-270.8706	-2.2100	-2.2934
1.343	0.5234	800	1.3449	0.0814	-0.3810	0.7150	0.4623	-257.9127	-271.3629	-2.2312	-2.3121
1.3439	0.5888	900	1.3459	0.0385	-0.4054	0.7250	0.4439	-258.1573	-271.7917	-2.2327	-2.3137
1.3388	0.6542	1000	1.3442	0.2150	-0.2625	0.7325	0.4775	-256.7277	-270.0262	-2.2387	-2.3183
1.3186	0.7197	1100	1.3423	0.1242	-0.3587	0.7325	0.4829	-257.6895	-270.9345	-2.2306	-2.3107
1.3299	0.7851	1200	1.3417	0.1468	-0.3270	0.7425	0.4737	-257.3728	-270.7089	-2.2275	-2.3078
1.3248	0.8505	1300	1.3413	0.1555	-0.3132	0.7525	0.4687	-257.2347	-270.6216	-2.2306	-2.3105
1.3398	0.9159	1400	1.3414	0.1409	-0.3251	0.7475	0.4660	-257.3535	-270.7675	-2.2317	-2.3117
1.325	0.9814	1500	1.3409	0.1433	-0.3268	0.7475	0.4701	-257.3707	-270.7436	-2.2339	-2.3137

Framework versions

PEFT 0.10.0
Transformers 4.43.1
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

Kimory-X
/

zephyr-7b-nca_pair-qlora-lr5e6-beta0.1

zephyr-7b-nca_pair-qlora-lr5e6-beta0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Kimory-X/zephyr-7b-nca_pair-qlora-lr5e6-beta0.1

Dataset used to train Kimory-X/zephyr-7b-nca_pair-qlora-lr5e6-beta0.1

Evaluation results