mistral-7b-expo-7b-IPO-25-final-1

This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:

Loss: 40.7478
Objective: 41.4189
Reward Accuracy: 0.6560
Logp Accuracy: 0.6449
Log Diff Policy: 12.9557
Chosen Logps: -164.9055
Rejected Logps: -177.8611
Chosen Rewards: -0.7022
Rejected Rewards: -0.8280
Logits: -1.8762

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 3
eval_batch_size: 3
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 108
total_eval_batch_size: 9
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.2
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
49.9605	0.1213	80	49.9610	49.9539	0.5350	0.5176	0.4283	-93.8819	-94.3102	0.0080	0.0075	-2.2018
49.4535	0.2427	160	49.5361	49.5283	0.5593	0.5249	0.8538	-90.2269	-91.0807	0.0446	0.0398	-2.1641
44.982	0.3640	240	45.8690	46.1325	0.5808	0.5624	4.6125	-117.8028	-122.4154	-0.2312	-0.2735	-1.8757
41.1334	0.4853	320	43.5046	43.8982	0.6105	0.6032	7.8409	-127.0870	-134.9279	-0.3240	-0.3986	-1.8141
39.7268	0.6067	400	42.7600	43.1496	0.6334	0.6214	10.4767	-125.7633	-136.2400	-0.3108	-0.4118	-1.8837
39.7663	0.7280	480	41.7225	42.2563	0.6435	0.6390	10.9695	-128.9929	-139.9624	-0.3431	-0.4490	-2.0056
39.5465	0.8493	560	41.3188	41.8171	0.6521	0.6345	10.6324	-140.6873	-151.3197	-0.4601	-0.5626	-1.9821
40.4391	0.9707	640	41.1982	41.9165	0.6482	0.6471	12.5170	-147.6488	-160.1658	-0.5297	-0.6510	-1.9380
38.5771	1.0920	720	41.2521	41.9791	0.6563	0.6337	12.3195	-134.2294	-146.5489	-0.3955	-0.5148	-1.8990
37.5887	1.2133	800	41.1525	41.8777	0.6510	0.6334	11.9934	-137.1610	-149.1544	-0.4248	-0.5409	-1.9650
38.8787	1.3347	880	40.8906	41.4724	0.6541	0.6393	12.1420	-143.8819	-156.0239	-0.4920	-0.6096	-1.9757
36.5702	1.4560	960	40.9781	41.5046	0.6555	0.6423	12.6362	-130.7231	-143.3593	-0.3604	-0.4829	-1.9758
35.9143	1.5774	1040	40.9837	41.6091	0.6502	0.6432	12.7345	-136.4761	-149.2107	-0.4179	-0.5415	-1.9899
36.9408	1.6987	1120	40.8692	41.4085	0.6555	0.6376	12.1356	-152.8772	-165.0128	-0.5819	-0.6995	-1.7977
36.6248	1.8200	1200	40.5552	41.1028	0.6608	0.6437	12.9051	-143.6405	-156.5456	-0.4896	-0.6148	-1.8976
36.0414	1.9414	1280	40.8334	41.4863	0.6541	0.6395	12.5224	-151.9377	-164.4601	-0.5726	-0.6940	-1.8392

Framework versions

PEFT 0.11.1
Transformers 4.42.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

mistral-7b-expo-7b-IPO-25-final-1

mistral-7b-expo-7b-IPO-25-final-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/mistral-7b-expo-7b-IPO-25-final-1

Dataset used to train hZzy/mistral-7b-expo-7b-IPO-25-final-1

Evaluation results