mistral-7b-expo-7b-L2EXPO-25-final-2

This model is a fine-tuned version of hZzy/mistral-7b-sft-25-1 on the hZzy/direction_right2 dataset. It achieves the following results on the evaluation set:

Loss: 0.4637
Objective: 0.4623
Reward Accuracy: 0.6510
Logp Accuracy: 0.5705
Log Diff Policy: 3.5252
Chosen Logps: -91.4483
Rejected Logps: -94.9735
Chosen Rewards: 0.1617
Rejected Rewards: 0.0045
Logits: -2.0668

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 3
eval_batch_size: 3
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 108
total_eval_batch_size: 9
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.2
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
0.6143	0.1213	80	0.5109	0.5074	0.5414	0.5154	0.4308	-93.9032	-94.3340	0.0390	0.0365	-2.2003
0.5624	0.2427	160	0.5040	0.5009	0.5713	0.5210	0.7341	-92.2105	-92.9446	0.1236	0.1060	-2.1577
0.5296	0.3640	240	0.4863	0.4843	0.6079	0.5316	1.4806	-93.1361	-94.6168	0.0773	0.0224	-2.1544
0.5062	0.4853	320	0.4769	0.4758	0.6295	0.5467	2.3729	-88.3420	-90.7150	0.3170	0.2175	-2.2017
0.489	0.6067	400	0.4707	0.4703	0.6418	0.5576	2.7726	-89.4151	-92.1878	0.2634	0.1438	-2.2280
0.4965	0.7280	480	0.4679	0.4692	0.6331	0.5610	2.8915	-89.0811	-91.9726	0.2801	0.1546	-2.2266
0.4905	0.8493	560	0.4693	0.4712	0.6390	0.5607	3.1056	-91.1217	-94.2273	0.1780	0.0419	-2.2026
0.4547	0.9707	640	0.4653	0.4671	0.6353	0.5624	3.1047	-92.8997	-96.0044	0.0891	-0.0470	-2.1916
0.4497	1.0920	720	0.4672	0.4683	0.6404	0.5660	3.3740	-84.0528	-87.4268	0.5315	0.3819	-2.1664
0.4358	1.2133	800	0.4617	0.4629	0.6398	0.5629	3.2392	-85.6140	-88.8532	0.4534	0.3106	-2.1284
0.4572	1.3347	880	0.4665	0.4689	0.6398	0.5682	3.3859	-89.4141	-92.8000	0.2634	0.1132	-2.1583
0.4362	1.4560	960	0.4647	0.4657	0.6395	0.5702	3.5918	-86.3147	-89.9066	0.4184	0.2579	-2.1314
0.3976	1.5774	1040	0.4635	0.4660	0.6365	0.5685	3.5224	-88.0452	-91.5676	0.3319	0.1748	-2.1032
0.4082	1.6987	1120	0.4628	0.4651	0.6429	0.5707	3.4228	-85.0947	-88.5175	0.4794	0.3273	-2.0848
0.4037	1.8200	1200	0.4621	0.4621	0.6449	0.5775	3.6505	-93.7257	-97.3762	0.0478	-0.1156	-2.0348
0.3961	1.9414	1280	0.4607	0.4613	0.6418	0.5744	3.4066	-90.0586	-93.4653	0.2312	0.0800	-2.0810

Framework versions

PEFT 0.11.1
Transformers 4.42.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

mistral-7b-expo-7b-L2EXPO-25-final-2

mistral-7b-expo-7b-L2EXPO-25-final-2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/mistral-7b-expo-7b-L2EXPO-25-final-2

Dataset used to train hZzy/mistral-7b-expo-7b-L2EXPO-25-final-2

Evaluation results