qwen2.5-0.5b-expo-L2EXPO-25-2
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.4846
- Objective: 0.4700
- Reward Accuracy: 0.6219
- Logp Accuracy: 0.6230
- Log Diff Policy: 55.1050
- Chosen Logps: -364.5329
- Rejected Logps: -419.6379
- Chosen Rewards: -0.2771
- Rejected Rewards: -0.3318
- Logits: -4.6179
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5039 | 0.1577 | 50 | 0.5116 | 0.5048 | 0.5470 | 0.5218 | 1.0926 | -92.9307 | -94.0233 | -0.0055 | -0.0062 | -1.2157 |
0.5118 | 0.3154 | 100 | 0.5106 | 0.5038 | 0.5772 | 0.5386 | 2.1626 | -94.4899 | -96.6525 | -0.0070 | -0.0089 | -1.3464 |
0.5278 | 0.4731 | 150 | 0.5086 | 0.5014 | 0.5738 | 0.5576 | 5.1593 | -135.0134 | -140.1726 | -0.0475 | -0.0524 | -1.7394 |
0.4845 | 0.6307 | 200 | 0.5046 | 0.4964 | 0.5755 | 0.5772 | 12.1099 | -208.8495 | -220.9594 | -0.1214 | -0.1332 | -2.1451 |
0.4953 | 0.7884 | 250 | 0.5007 | 0.4912 | 0.5934 | 0.5923 | 19.7754 | -249.4757 | -269.2511 | -0.1620 | -0.1815 | -2.6017 |
0.4661 | 0.9461 | 300 | 0.4969 | 0.4857 | 0.6012 | 0.5968 | 27.9289 | -288.4738 | -316.4027 | -0.2010 | -0.2286 | -2.9416 |
0.4725 | 1.1038 | 350 | 0.4936 | 0.4822 | 0.6124 | 0.6023 | 33.0923 | -295.9875 | -329.0798 | -0.2085 | -0.2413 | -3.2578 |
0.4881 | 1.2615 | 400 | 0.4913 | 0.4795 | 0.6102 | 0.6113 | 37.9280 | -299.2147 | -337.1428 | -0.2117 | -0.2493 | -3.5394 |
0.4575 | 1.4192 | 450 | 0.4891 | 0.4761 | 0.6214 | 0.6119 | 42.1253 | -322.7853 | -364.9105 | -0.2353 | -0.2771 | -3.9786 |
0.4817 | 1.5769 | 500 | 0.4882 | 0.4743 | 0.6214 | 0.6174 | 47.9842 | -360.6322 | -408.6165 | -0.2732 | -0.3208 | -4.2328 |
0.4459 | 1.7346 | 550 | 0.4858 | 0.4719 | 0.6180 | 0.6158 | 51.5714 | -355.0592 | -406.6306 | -0.2676 | -0.3188 | -4.4310 |
0.4515 | 1.8922 | 600 | 0.4846 | 0.4700 | 0.6219 | 0.6230 | 55.1050 | -364.5329 | -419.6379 | -0.2771 | -0.3318 | -4.6179 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.