qwen2.5-0.5b-expo-L2EXPO-25-3
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-2 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4834
- Objective: 0.4703
- Reward Accuracy: 0.6253
- Logp Accuracy: 0.6208
- Log Diff Policy: 104.4308
- Chosen Logps: -755.0930
- Rejected Logps: -859.5239
- Chosen Rewards: -0.6676
- Rejected Rewards: -0.7717
- Logits: -7.5473
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5021 | 0.1577 | 50 | 0.5103 | 0.5036 | 0.5436 | 0.5414 | 2.5654 | -115.5719 | -118.1373 | -0.0281 | -0.0303 | -1.5308 |
0.5018 | 0.3154 | 100 | 0.5003 | 0.4908 | 0.5872 | 0.5845 | 21.3523 | -263.8485 | -285.2008 | -0.1764 | -0.1974 | -3.0190 |
0.504 | 0.4731 | 150 | 0.4872 | 0.4773 | 0.6063 | 0.6001 | 44.6167 | -383.3965 | -428.0132 | -0.2959 | -0.3402 | -4.3297 |
0.4543 | 0.6307 | 200 | 0.4813 | 0.4703 | 0.6180 | 0.6202 | 61.6627 | -484.1489 | -545.8116 | -0.3967 | -0.4580 | -5.3493 |
0.4567 | 0.7884 | 250 | 0.4788 | 0.4692 | 0.6247 | 0.6247 | 71.4366 | -527.9006 | -599.3371 | -0.4404 | -0.5115 | -6.1216 |
0.4225 | 0.9461 | 300 | 0.4779 | 0.4668 | 0.6208 | 0.6163 | 87.7644 | -624.8009 | -712.5652 | -0.5373 | -0.6248 | -6.6622 |
0.4 | 1.1038 | 350 | 0.4803 | 0.4703 | 0.6169 | 0.6119 | 88.7974 | -605.5159 | -694.3134 | -0.5180 | -0.6065 | -6.4728 |
0.3985 | 1.2615 | 400 | 0.4814 | 0.4707 | 0.6197 | 0.6163 | 94.6204 | -706.7501 | -801.3705 | -0.6193 | -0.7136 | -7.3144 |
0.3723 | 1.4192 | 450 | 0.4802 | 0.4719 | 0.6163 | 0.6147 | 94.9399 | -637.5876 | -732.5275 | -0.5501 | -0.6447 | -6.9109 |
0.381 | 1.5769 | 500 | 0.4905 | 0.4779 | 0.6107 | 0.6113 | 105.0160 | -843.2735 | -948.2895 | -0.7558 | -0.8605 | -7.7570 |
0.3538 | 1.7346 | 550 | 0.4837 | 0.4721 | 0.6130 | 0.6130 | 95.3465 | -707.1511 | -802.4977 | -0.6197 | -0.7147 | -7.6110 |
0.347 | 1.8922 | 600 | 0.4822 | 0.4705 | 0.6180 | 0.6158 | 98.6297 | -755.7971 | -854.4269 | -0.6683 | -0.7666 | -7.5941 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.