qwen_cfUNL_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: -45.0891
  • Rewards/rejected: -46.1094
  • Rewards/accuracies: 0.5682
  • Rewards/margins: 1.0204
  • Logps/rejected: -46.1094
  • Logps/chosen: -45.0891
  • Logits/rejected: 7.4245
  • Logits/chosen: 7.7499

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 0.2141 400 0.0001 -30.9341 -32.7194 0.5697 1.7853 -32.7194 -30.9341 4.5291 4.5264
0.0 0.4282 800 0.0000 -38.7534 -40.2065 0.5593 1.4531 -40.2065 -38.7534 6.2341 6.3877
0.0009 0.6422 1200 0.0000 -38.5460 -39.9578 0.5512 1.4119 -39.9578 -38.5460 6.1244 6.2779
0.0 0.8563 1600 0.0000 -40.0222 -41.4115 0.5690 1.3893 -41.4115 -40.0222 6.5494 6.7346
0.0 1.0704 2000 0.0000 -43.0566 -44.2275 0.5653 1.1709 -44.2275 -43.0566 7.0818 7.3504
0.0 1.2845 2400 0.0000 -43.5288 -44.6477 0.5645 1.1189 -44.6477 -43.5288 7.1882 7.4775
0.0 1.4986 2800 0.0000 -43.7383 -44.8584 0.5660 1.1201 -44.8584 -43.7383 7.1745 7.4634
0.0 1.7127 3200 0.0000 -44.4950 -45.5556 0.5638 1.0605 -45.5556 -44.4950 7.2848 7.5950
0.0 1.9267 3600 0.0000 -44.5958 -45.6569 0.5645 1.0611 -45.6569 -44.5958 7.2814 7.5948
0.0 2.1408 4000 0.0000 -44.8271 -45.8411 0.5668 1.0140 -45.8411 -44.8271 7.4235 7.7436
0.0 2.3549 4400 0.0000 -45.1344 -46.1374 0.5653 1.0030 -46.1374 -45.1344 7.3526 7.6831
0.0 2.5690 4800 0.0000 -45.0201 -46.0501 0.5653 1.0300 -46.0501 -45.0201 7.3843 7.7103
0.0 2.7831 5200 0.0000 -45.3432 -46.3394 0.5653 0.9961 -46.3394 -45.3432 7.4499 7.7830
0.0 2.9972 5600 0.0000 -45.0891 -46.1094 0.5682 1.0204 -46.1094 -45.0891 7.4245 7.7499

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
122
Safetensors
Model size
464M params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for yakazimir/qwen_cfUNL_entropy

Base model

Qwen/Qwen1.5-0.5B
Finetuned
(23)
this model

Dataset used to train yakazimir/qwen_cfUNL_entropy