qwen2.5-0.5b-expo-IPO-25-1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-2 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:
- Loss: 44.8744
- Objective: 44.8557
- Reward Accuracy: 0.5884
- Logp Accuracy: 0.5632
- Log Diff Policy: 6.6358
- Chosen Logps: -132.9855
- Rejected Logps: -139.6213
- Chosen Rewards: -0.4551
- Rejected Rewards: -0.5182
- Logits: -2.2391
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
48.4932 | 0.1577 | 50 | 48.8640 | 48.8817 | 0.5336 | 0.5246 | 1.4437 | -99.4633 | -100.9070 | -0.1199 | -0.1311 | -1.3149 |
46.098 | 0.3154 | 100 | 47.4286 | 47.0768 | 0.5682 | 0.5610 | 3.6136 | -120.9156 | -124.5292 | -0.3344 | -0.3673 | -1.5087 |
44.3116 | 0.4731 | 150 | 45.7856 | 45.6942 | 0.5850 | 0.5811 | 5.0661 | -113.3503 | -118.4164 | -0.2587 | -0.3062 | -1.6136 |
43.3978 | 0.6307 | 200 | 45.5774 | 45.2083 | 0.5973 | 0.5777 | 6.3228 | -133.1022 | -139.4250 | -0.4563 | -0.5162 | -1.9086 |
42.7061 | 0.7884 | 250 | 44.9593 | 44.7260 | 0.6046 | 0.5772 | 6.5003 | -124.7787 | -131.2790 | -0.3730 | -0.4348 | -1.9727 |
41.9962 | 0.9461 | 300 | 44.5164 | 44.4618 | 0.5979 | 0.5794 | 6.9872 | -129.9575 | -136.9447 | -0.4248 | -0.4914 | -2.0568 |
38.2458 | 1.1038 | 350 | 44.7698 | 44.6525 | 0.6007 | 0.5794 | 6.6454 | -127.8146 | -134.4600 | -0.4034 | -0.4666 | -2.0736 |
36.6528 | 1.2615 | 400 | 45.2601 | 44.9216 | 0.6040 | 0.5772 | 6.9298 | -135.8740 | -142.8038 | -0.4840 | -0.5500 | -2.1306 |
37.2127 | 1.4192 | 450 | 44.8450 | 44.9502 | 0.5962 | 0.5800 | 6.5044 | -129.3140 | -135.8184 | -0.4184 | -0.4802 | -2.1449 |
36.5389 | 1.5769 | 500 | 44.9225 | 44.7990 | 0.5872 | 0.5632 | 6.8611 | -139.9226 | -146.7837 | -0.5245 | -0.5898 | -2.2197 |
36.1702 | 1.7346 | 550 | 44.9264 | 44.6704 | 0.6035 | 0.5710 | 6.9227 | -140.5884 | -147.5112 | -0.5311 | -0.5971 | -2.2685 |
36.3218 | 1.8922 | 600 | 44.6893 | 44.7346 | 0.5973 | 0.5671 | 6.5700 | -129.7259 | -136.2959 | -0.4225 | -0.4850 | -2.2583 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.