qwen2.5-0.5b-expo-L2EXPO-25-1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-1 on the hZzy/train_pairwise_all_new2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2196
- Objective: 0.2162
- Reward Accuracy: 0.6247
- Logp Accuracy: 0.5190
- Log Diff Policy: 0.5202
- Chosen Logps: -91.5077
- Rejected Logps: -92.0279
- Chosen Rewards: 0.0552
- Rejected Rewards: 0.0180
- Logits: -1.3057
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.2221 | 0.1577 | 50 | 0.2308 | 0.2270 | 0.5649 | 0.5145 | 0.2791 | -92.8045 | -93.0836 | -0.0745 | -0.0875 | -1.3046 |
0.215 | 0.3154 | 100 | 0.2272 | 0.2238 | 0.6023 | 0.5157 | 0.3836 | -93.1475 | -93.5311 | -0.1088 | -0.1323 | -1.3103 |
0.2132 | 0.4731 | 150 | 0.2243 | 0.2213 | 0.6012 | 0.5168 | 0.4200 | -91.9704 | -92.3904 | 0.0089 | -0.0182 | -1.3101 |
0.1848 | 0.6307 | 200 | 0.2223 | 0.2194 | 0.6096 | 0.5145 | 0.4527 | -92.7829 | -93.2356 | -0.0723 | -0.1027 | -1.3491 |
0.1831 | 0.7884 | 250 | 0.2239 | 0.2207 | 0.6107 | 0.5190 | 0.4721 | -91.7648 | -92.2369 | 0.0295 | -0.0028 | -1.2826 |
0.1723 | 0.9461 | 300 | 0.2213 | 0.2186 | 0.6225 | 0.5173 | 0.4962 | -91.0092 | -91.5054 | 0.1050 | 0.0703 | -1.3274 |
0.1411 | 1.1038 | 350 | 0.2244 | 0.2209 | 0.6119 | 0.5185 | 0.5052 | -92.0883 | -92.5935 | -0.0029 | -0.0385 | -1.3190 |
0.1412 | 1.2615 | 400 | 0.2208 | 0.2173 | 0.6130 | 0.5179 | 0.5072 | -90.6094 | -91.1167 | 0.1450 | 0.1092 | -1.3359 |
0.1309 | 1.4192 | 450 | 0.2207 | 0.2173 | 0.6281 | 0.5201 | 0.5157 | -91.4469 | -91.9627 | 0.0613 | 0.0246 | -1.3163 |
0.1336 | 1.5769 | 500 | 0.2212 | 0.2179 | 0.6219 | 0.5190 | 0.5056 | -91.5764 | -92.0820 | 0.0483 | 0.0126 | -1.3171 |
0.1203 | 1.7346 | 550 | 0.2207 | 0.2176 | 0.6320 | 0.5196 | 0.5051 | -91.1888 | -91.6939 | 0.0871 | 0.0515 | -1.3031 |
0.1193 | 1.8922 | 600 | 0.2207 | 0.2173 | 0.6275 | 0.5173 | 0.5090 | -91.7077 | -92.2167 | 0.0352 | -0.0008 | -1.2903 |
0.0974 | 2.0499 | 650 | 0.2198 | 0.2156 | 0.6320 | 0.5179 | 0.5221 | -91.5659 | -92.0880 | 0.0494 | 0.0120 | -1.3050 |
0.0946 | 2.2076 | 700 | 0.2201 | 0.2163 | 0.6342 | 0.5196 | 0.5182 | -91.6268 | -92.1451 | 0.0433 | 0.0063 | -1.2980 |
0.0959 | 2.3653 | 750 | 0.2195 | 0.2163 | 0.6370 | 0.5179 | 0.5202 | -91.4802 | -92.0004 | 0.0580 | 0.0208 | -1.3090 |
0.0932 | 2.5230 | 800 | 0.2197 | 0.2165 | 0.6236 | 0.5196 | 0.5172 | -91.5272 | -92.0445 | 0.0532 | 0.0164 | -1.3048 |
0.0904 | 2.6807 | 850 | 0.2196 | 0.2163 | 0.6270 | 0.5190 | 0.5193 | -91.5233 | -92.0425 | 0.0536 | 0.0166 | -1.3060 |
0.0904 | 2.8384 | 900 | 0.2195 | 0.2162 | 0.6258 | 0.5190 | 0.5202 | -91.5277 | -92.0479 | 0.0532 | 0.0161 | -1.3057 |
0.0963 | 2.9961 | 950 | 0.2196 | 0.2162 | 0.6247 | 0.5190 | 0.5202 | -91.5078 | -92.0279 | 0.0552 | 0.0180 | -1.3057 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.