qwen2.5-0.5b-expo-L2EXPO-25-8
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-2 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:
- Loss: 0.5029
- Objective: 0.4944
- Reward Accuracy: 0.6079
- Logp Accuracy: 0.5755
- Log Diff Policy: 9.1109
- Chosen Logps: -174.4573
- Rejected Logps: -183.5682
- Chosen Rewards: -0.8698
- Rejected Rewards: -0.9577
- Logits: -2.2881
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.4963 | 0.3154 | 100 | 0.5019 | 0.4943 | 0.5861 | 0.5375 | 2.0366 | -100.3426 | -102.3792 | -0.1287 | -0.1458 | -1.3133 |
0.4525 | 0.6307 | 200 | 0.4886 | 0.4819 | 0.6085 | 0.5721 | 4.4990 | -127.0285 | -131.5274 | -0.3955 | -0.4373 | -1.5453 |
0.4258 | 0.9461 | 300 | 0.4814 | 0.4729 | 0.6174 | 0.5895 | 6.1183 | -135.1365 | -141.2549 | -0.4766 | -0.5345 | -1.8098 |
0.4056 | 1.2615 | 400 | 0.4869 | 0.4751 | 0.6292 | 0.5867 | 7.5129 | -143.3356 | -150.8486 | -0.5586 | -0.6305 | -1.8987 |
0.3918 | 1.5769 | 500 | 0.4866 | 0.4788 | 0.6208 | 0.5839 | 7.5316 | -147.2415 | -154.7731 | -0.5977 | -0.6697 | -2.0209 |
0.3546 | 1.8922 | 600 | 0.4929 | 0.4876 | 0.6107 | 0.5761 | 7.9070 | -158.6364 | -166.5434 | -0.7116 | -0.7874 | -2.1730 |
0.301 | 2.2076 | 700 | 0.4937 | 0.4871 | 0.6057 | 0.5744 | 7.8447 | -161.5585 | -169.4032 | -0.7408 | -0.8160 | -2.1900 |
0.2866 | 2.5230 | 800 | 0.4984 | 0.4930 | 0.6113 | 0.5733 | 8.3379 | -168.4397 | -176.7776 | -0.8096 | -0.8898 | -2.2740 |
0.2571 | 2.8384 | 900 | 0.5027 | 0.4964 | 0.6119 | 0.5694 | 8.8986 | -174.1609 | -183.0595 | -0.8669 | -0.9526 | -2.2442 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.