yakazimir's picture
Model save
0aa2b6b verified
|
raw
history blame
4.82 kB
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - trl
  - simpo
  - generated_from_trainer
model-index:
  - name: qwen_cpo_entropy_0_3
    results: []

qwen_cpo_entropy_0_3

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0436
  • Sft Loss: 1.4855
  • Rewards/chosen: -1.5353
  • Rewards/rejected: -2.2017
  • Rewards/accuracies: 0.6476
  • Rewards/margins: 0.6664
  • Logps/rejected: -2.2017
  • Logps/chosen: -1.5353
  • Logits/rejected: -0.4248
  • Logits/chosen: -0.4735

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Sft Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.0742 0.2141 400 1.0823 1.3696 -1.3461 -1.5309 0.5779 0.1848 -1.5309 -1.3461 0.3537 0.2666
1.0513 0.4282 800 1.0583 1.4056 -1.4223 -1.7731 0.6039 0.3508 -1.7731 -1.4223 0.2684 0.1774
1.0763 0.6422 1200 1.0495 1.3954 -1.3876 -1.7498 0.6046 0.3622 -1.7498 -1.3876 0.4388 0.3313
1.0436 0.8563 1600 1.0524 1.3880 -1.3712 -1.6780 0.6091 0.3068 -1.6780 -1.3712 0.5673 0.4476
1.0569 1.0704 2000 1.0427 1.4160 -1.4133 -1.8531 0.6298 0.4398 -1.8531 -1.4133 0.0092 -0.0695
0.9655 1.2845 2400 1.0376 1.4169 -1.4205 -1.9133 0.6358 0.4928 -1.9133 -1.4205 -0.2668 -0.3251
1.0333 1.4986 2800 1.0458 1.3973 -1.3793 -1.7731 0.6128 0.3937 -1.7731 -1.3793 -0.0046 -0.0841
0.9824 1.7127 3200 1.0347 1.4063 -1.3916 -1.8345 0.6283 0.4429 -1.8345 -1.3916 -0.2377 -0.2977
0.9557 1.9267 3600 1.0309 1.4319 -1.4343 -1.9644 0.6454 0.5301 -1.9644 -1.4343 -0.2903 -0.3472
0.8559 2.1408 4000 1.0420 1.4888 -1.5362 -2.1635 0.6550 0.6272 -2.1635 -1.5362 -0.1761 -0.2434
0.8788 2.3549 4400 1.0414 1.4794 -1.5273 -2.1771 0.6469 0.6498 -2.1771 -1.5273 -0.2963 -0.3552
0.8747 2.5690 4800 1.0419 1.4756 -1.5253 -2.1757 0.6454 0.6504 -2.1757 -1.5253 -0.3952 -0.4464
0.8717 2.7831 5200 1.0438 1.4855 -1.5370 -2.2063 0.6469 0.6693 -2.2063 -1.5370 -0.4497 -0.4964
0.8816 2.9972 5600 1.0436 1.4855 -1.5353 -2.2017 0.6476 0.6664 -2.2017 -1.5353 -0.4248 -0.4735

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1