yakazimir's picture
End of training
1fbec58 verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_cUNL_entropy_0_01
    results: []

qwen_cUNL_entropy_0_01

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5806
  • Sft Loss: 4.6392
  • Rewards/chosen: -4.6682
  • Rewards/rejected: -5.7215
  • Rewards/accuracies: 0.7292
  • Rewards/margins: 1.0533
  • Logps/rejected: -5.7215
  • Logps/chosen: -4.6682
  • Logits/rejected: 0.0927
  • Logits/chosen: 0.0157

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Sft Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.839 0.2141 400 0.8444 1.5362 -1.7061 -1.9017 0.5564 0.1956 -1.9017 -1.7061 0.3788 0.2887
0.6224 0.4282 800 0.6288 3.4512 -3.4767 -4.0245 0.6869 0.5479 -4.0245 -3.4767 0.3857 0.3092
0.6292 0.6422 1200 0.5943 3.9913 -3.8913 -4.5950 0.7211 0.7038 -4.5950 -3.8913 0.3272 0.2462
0.5282 0.8563 1600 0.5852 3.8604 -3.7994 -4.4882 0.7174 0.6888 -4.4882 -3.7994 0.2184 0.1456
0.6187 1.0704 2000 0.5858 4.1311 -4.1032 -4.8789 0.7151 0.7757 -4.8789 -4.1032 0.1497 0.0695
0.5774 1.2845 2400 0.5777 4.3179 -4.2615 -5.1611 0.7277 0.8996 -5.1611 -4.2615 0.2452 0.1579
0.5393 1.4986 2800 0.5736 4.3506 -4.3258 -5.2226 0.7255 0.8968 -5.2226 -4.3258 0.3460 0.2569
0.5981 1.7127 3200 0.5695 4.2779 -4.2570 -5.1734 0.7270 0.9164 -5.1734 -4.2570 0.1928 0.1184
0.5856 1.9267 3600 0.5678 4.1129 -4.0894 -4.9749 0.7337 0.8856 -4.9749 -4.0894 0.1633 0.0889
0.4692 2.1408 4000 0.5829 4.6998 -4.7020 -5.7415 0.7300 1.0395 -5.7415 -4.7020 0.1569 0.0750
0.4844 2.3549 4400 0.5827 4.6692 -4.7235 -5.7762 0.7315 1.0527 -5.7762 -4.7235 0.1451 0.0641
0.488 2.5690 4800 0.5792 4.5805 -4.6213 -5.6703 0.7315 1.0490 -5.6703 -4.6213 0.1281 0.0486
0.4404 2.7831 5200 0.5804 4.6279 -4.6623 -5.7139 0.7300 1.0516 -5.7139 -4.6623 0.0807 0.0044
0.4531 2.9972 5600 0.5806 4.6392 -4.6683 -5.7215 0.7292 1.0533 -5.7215 -4.6683 0.0927 0.0156

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1