qwen2.5-0.5b-expo-L2EXPO-25-5
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-2 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4923
- Objective: 0.4887
- Reward Accuracy: 0.6074
- Logp Accuracy: 0.5755
- Log Diff Policy: 8.2386
- Chosen Logps: -164.2356
- Rejected Logps: -172.4742
- Chosen Rewards: -0.7676
- Rejected Rewards: -0.8467
- Logits: -2.1557
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.4966 | 0.1577 | 50 | 0.5072 | 0.4997 | 0.5419 | 0.5246 | 1.3296 | -94.5993 | -95.9289 | -0.0712 | -0.0813 | -1.2757 |
0.4916 | 0.3154 | 100 | 0.4996 | 0.4914 | 0.5923 | 0.5459 | 2.6241 | -103.4037 | -106.0279 | -0.1593 | -0.1823 | -1.3990 |
0.495 | 0.4731 | 150 | 0.4911 | 0.4846 | 0.5917 | 0.5643 | 3.8009 | -118.6434 | -122.4443 | -0.3117 | -0.3464 | -1.4872 |
0.4515 | 0.6307 | 200 | 0.4857 | 0.4794 | 0.6147 | 0.5794 | 4.8895 | -128.3570 | -133.2465 | -0.4088 | -0.4545 | -1.6300 |
0.4525 | 0.7884 | 250 | 0.4853 | 0.4768 | 0.6191 | 0.5817 | 5.7732 | -127.3466 | -133.1198 | -0.3987 | -0.4532 | -1.8956 |
0.4265 | 0.9461 | 300 | 0.4800 | 0.4722 | 0.6208 | 0.5906 | 6.1759 | -134.5628 | -140.7387 | -0.4709 | -0.5294 | -1.8486 |
0.3982 | 1.1038 | 350 | 0.4826 | 0.4742 | 0.6152 | 0.5783 | 7.0062 | -142.1399 | -149.1461 | -0.5466 | -0.6135 | -1.8858 |
0.4035 | 1.2615 | 400 | 0.4837 | 0.4743 | 0.6152 | 0.5923 | 7.3228 | -147.7389 | -155.0617 | -0.6026 | -0.6726 | -1.9345 |
0.3797 | 1.4192 | 450 | 0.4862 | 0.4791 | 0.6091 | 0.5845 | 7.2548 | -148.0394 | -155.2942 | -0.6056 | -0.6749 | -2.0149 |
0.3863 | 1.5769 | 500 | 0.4864 | 0.4776 | 0.6163 | 0.5789 | 7.9205 | -150.3136 | -158.2340 | -0.6284 | -0.7043 | -2.0393 |
0.3587 | 1.7346 | 550 | 0.4872 | 0.4820 | 0.6102 | 0.5811 | 7.6852 | -150.1711 | -157.8564 | -0.6270 | -0.7006 | -2.1184 |
0.3436 | 1.8922 | 600 | 0.4934 | 0.4904 | 0.6074 | 0.5822 | 7.9098 | -162.3326 | -170.2424 | -0.7486 | -0.8244 | -2.1839 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 61
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.