qwen2.5-0.5b-expo-L2EXPO-25-6
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-2 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4799
- Objective: 0.4710
- Reward Accuracy: 0.6286
- Logp Accuracy: 0.5397
- Log Diff Policy: 2.0661
- Chosen Logps: -85.7699
- Rejected Logps: -87.8360
- Chosen Rewards: 0.0853
- Rejected Rewards: -0.0018
- Logits: -1.4056
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.4904 | 0.1577 | 50 | 0.5026 | 0.4950 | 0.5749 | 0.5106 | 0.7652 | -88.9662 | -89.7314 | -0.0745 | -0.0965 | -1.1920 |
0.4748 | 0.3154 | 100 | 0.4951 | 0.4910 | 0.5872 | 0.5173 | 1.0332 | -86.7849 | -87.8181 | 0.0345 | -0.0009 | -1.2013 |
0.4711 | 0.4731 | 150 | 0.4881 | 0.4819 | 0.6068 | 0.5213 | 1.4074 | -86.0353 | -87.4426 | 0.0720 | 0.0179 | -1.2411 |
0.4119 | 0.6307 | 200 | 0.4863 | 0.4770 | 0.6147 | 0.5268 | 1.6193 | -85.2995 | -86.9188 | 0.1088 | 0.0441 | -1.2370 |
0.4089 | 0.7884 | 250 | 0.4838 | 0.4765 | 0.6236 | 0.5224 | 1.5593 | -83.6080 | -85.1673 | 0.1934 | 0.1317 | -1.2247 |
0.3753 | 0.9461 | 300 | 0.4821 | 0.4739 | 0.6202 | 0.5263 | 1.6414 | -84.4769 | -86.1183 | 0.1499 | 0.0841 | -1.2815 |
0.3259 | 1.1038 | 350 | 0.4836 | 0.4766 | 0.6225 | 0.5263 | 1.7739 | -86.2151 | -87.9889 | 0.0630 | -0.0094 | -1.3657 |
0.3219 | 1.2615 | 400 | 0.4816 | 0.4732 | 0.6320 | 0.5313 | 1.9682 | -88.3414 | -90.3096 | -0.0433 | -0.1254 | -1.3400 |
0.3045 | 1.4192 | 450 | 0.4811 | 0.4715 | 0.6281 | 0.5280 | 1.8281 | -85.1054 | -86.9334 | 0.1185 | 0.0434 | -1.3487 |
0.3031 | 1.5769 | 500 | 0.4831 | 0.4733 | 0.6309 | 0.5324 | 1.8993 | -84.7464 | -86.6457 | 0.1365 | 0.0578 | -1.3286 |
0.2811 | 1.7346 | 550 | 0.4834 | 0.4724 | 0.6253 | 0.5369 | 2.0397 | -87.1533 | -89.1930 | 0.0161 | -0.0696 | -1.3743 |
0.2666 | 1.8922 | 600 | 0.4836 | 0.4768 | 0.6253 | 0.5336 | 1.9691 | -85.3313 | -87.3004 | 0.1072 | 0.0250 | -1.4091 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.