mistral-7b-expo-7b-IPO-25-1

This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:

Loss: 40.9160
Objective: 41.5373
Reward Accuracy: 0.6597
Logp Accuracy: 0.6415
Log Diff Policy: 10.9617
Chosen Logps: -133.3625
Rejected Logps: -144.3242
Chosen Rewards: -0.3868
Rejected Rewards: -0.4926
Logits: -2.0550

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 3
eval_batch_size: 3
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 108
total_eval_batch_size: 9
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
49.3588	0.1517	100	49.4313	49.4074	0.5652	0.5271	0.9748	-92.4140	-93.3888	0.0227	0.0168	-2.1900
42.9865	0.3033	200	44.3119	44.8376	0.5965	0.5867	7.1631	-121.9410	-129.1041	-0.2726	-0.3404	-1.9037
42.5687	0.4550	300	42.4967	42.9706	0.6356	0.6172	8.8037	-137.1864	-145.9901	-0.4250	-0.5093	-2.0350
38.5474	0.6067	400	41.9759	42.5688	0.6435	0.6211	10.6485	-109.0894	-119.7378	-0.1441	-0.2467	-2.0654
38.779	0.7583	500	41.2348	41.8556	0.6544	0.6404	12.3982	-125.9650	-138.3632	-0.3128	-0.4330	-2.1536
38.0359	0.9100	600	40.8263	41.3470	0.6630	0.6502	11.9953	-132.4066	-144.4019	-0.3772	-0.4934	-2.0724

Framework versions

PEFT 0.11.1
Transformers 4.42.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

mistral-7b-expo-7b-IPO-25-1

mistral-7b-expo-7b-IPO-25-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/mistral-7b-expo-7b-IPO-25-1

Dataset used to train hZzy/mistral-7b-expo-7b-IPO-25-1

Evaluation results