metadata

library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - trl
  - simpo
  - generated_from_trainer
model-index:
  - name: qwen_cpo_entropy_0_3
    results: []

qwen_cpo_entropy_0_3

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0436
Sft Loss: 1.4855
Rewards/chosen: -1.5353
Rewards/rejected: -2.2017
Rewards/accuracies: 0.6476
Rewards/margins: 0.6664
Logps/rejected: -2.2017
Logps/chosen: -1.5353
Logits/rejected: -0.4248
Logits/chosen: -0.4735

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Sft Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
1.0742	0.2141	400	1.0823	1.3696	-1.3461	-1.5309	0.5779	0.1848	-1.5309	-1.3461	0.3537	0.2666
1.0513	0.4282	800	1.0583	1.4056	-1.4223	-1.7731	0.6039	0.3508	-1.7731	-1.4223	0.2684	0.1774
1.0763	0.6422	1200	1.0495	1.3954	-1.3876	-1.7498	0.6046	0.3622	-1.7498	-1.3876	0.4388	0.3313
1.0436	0.8563	1600	1.0524	1.3880	-1.3712	-1.6780	0.6091	0.3068	-1.6780	-1.3712	0.5673	0.4476
1.0569	1.0704	2000	1.0427	1.4160	-1.4133	-1.8531	0.6298	0.4398	-1.8531	-1.4133	0.0092	-0.0695
0.9655	1.2845	2400	1.0376	1.4169	-1.4205	-1.9133	0.6358	0.4928	-1.9133	-1.4205	-0.2668	-0.3251
1.0333	1.4986	2800	1.0458	1.3973	-1.3793	-1.7731	0.6128	0.3937	-1.7731	-1.3793	-0.0046	-0.0841
0.9824	1.7127	3200	1.0347	1.4063	-1.3916	-1.8345	0.6283	0.4429	-1.8345	-1.3916	-0.2377	-0.2977
0.9557	1.9267	3600	1.0309	1.4319	-1.4343	-1.9644	0.6454	0.5301	-1.9644	-1.4343	-0.2903	-0.3472
0.8559	2.1408	4000	1.0420	1.4888	-1.5362	-2.1635	0.6550	0.6272	-2.1635	-1.5362	-0.1761	-0.2434
0.8788	2.3549	4400	1.0414	1.4794	-1.5273	-2.1771	0.6469	0.6498	-2.1771	-1.5273	-0.2963	-0.3552
0.8747	2.5690	4800	1.0419	1.4756	-1.5253	-2.1757	0.6454	0.6504	-2.1757	-1.5253	-0.3952	-0.4464
0.8717	2.7831	5200	1.0438	1.4855	-1.5370	-2.2063	0.6469	0.6693	-2.2063	-1.5370	-0.4497	-0.4964
0.8816	2.9972	5600	1.0436	1.4855	-1.5353	-2.2017	0.6476	0.6664	-2.2017	-1.5353	-0.4248	-0.4735

Framework versions

Transformers 4.44.2
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1