qwen2.5-0.5b-expo-IPO-25-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-2 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:

Loss: 44.8744
Objective: 44.8557
Reward Accuracy: 0.5884
Logp Accuracy: 0.5632
Log Diff Policy: 6.6358
Chosen Logps: -132.9855
Rejected Logps: -139.6213
Chosen Rewards: -0.4551
Rejected Rewards: -0.5182
Logits: -2.2391

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
48.4932	0.1577	50	48.8640	48.8817	0.5336	0.5246	1.4437	-99.4633	-100.9070	-0.1199	-0.1311	-1.3149
46.098	0.3154	100	47.4286	47.0768	0.5682	0.5610	3.6136	-120.9156	-124.5292	-0.3344	-0.3673	-1.5087
44.3116	0.4731	150	45.7856	45.6942	0.5850	0.5811	5.0661	-113.3503	-118.4164	-0.2587	-0.3062	-1.6136
43.3978	0.6307	200	45.5774	45.2083	0.5973	0.5777	6.3228	-133.1022	-139.4250	-0.4563	-0.5162	-1.9086
42.7061	0.7884	250	44.9593	44.7260	0.6046	0.5772	6.5003	-124.7787	-131.2790	-0.3730	-0.4348	-1.9727
41.9962	0.9461	300	44.5164	44.4618	0.5979	0.5794	6.9872	-129.9575	-136.9447	-0.4248	-0.4914	-2.0568
38.2458	1.1038	350	44.7698	44.6525	0.6007	0.5794	6.6454	-127.8146	-134.4600	-0.4034	-0.4666	-2.0736
36.6528	1.2615	400	45.2601	44.9216	0.6040	0.5772	6.9298	-135.8740	-142.8038	-0.4840	-0.5500	-2.1306
37.2127	1.4192	450	44.8450	44.9502	0.5962	0.5800	6.5044	-129.3140	-135.8184	-0.4184	-0.4802	-2.1449
36.5389	1.5769	500	44.9225	44.7990	0.5872	0.5632	6.8611	-139.9226	-146.7837	-0.5245	-0.5898	-2.2197
36.1702	1.7346	550	44.9264	44.6704	0.6035	0.5710	6.9227	-140.5884	-147.5112	-0.5311	-0.5971	-2.2685
36.3218	1.8922	600	44.6893	44.7346	0.5973	0.5671	6.5700	-129.7259	-136.2959	-0.4225	-0.4850	-2.2583

Framework versions

Transformers 4.42.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-IPO-25-1

qwen2.5-0.5b-expo-IPO-25-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-IPO-25-1

Dataset used to train hZzy/qwen2.5-0.5b-expo-IPO-25-1

Evaluation results