qwen2.5-0.5b-sft3-25-1
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new_full2 dataset. It achieves the following results on the evaluation set:
- Loss: 2.2516
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 160
- total_eval_batch_size: 20
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.2296 | 0.1223 | 5 | 3.2136 |
3.225 | 0.2446 | 10 | 3.2068 |
3.2177 | 0.3670 | 15 | 3.1877 |
3.1918 | 0.4893 | 20 | 3.1448 |
3.1562 | 0.6116 | 25 | 3.1100 |
3.0931 | 0.7339 | 30 | 3.0523 |
3.053 | 0.8563 | 35 | 3.0036 |
2.9924 | 0.9786 | 40 | 2.9515 |
2.9383 | 1.1009 | 45 | 2.8940 |
2.8913 | 1.2232 | 50 | 2.8538 |
2.855 | 1.3456 | 55 | 2.8187 |
2.8094 | 1.4679 | 60 | 2.7831 |
2.7692 | 1.5902 | 65 | 2.7499 |
2.7438 | 1.7125 | 70 | 2.7179 |
2.7143 | 1.8349 | 75 | 2.6884 |
2.674 | 1.9572 | 80 | 2.6618 |
2.6404 | 2.0795 | 85 | 2.6370 |
2.6205 | 2.2018 | 90 | 2.6131 |
2.6003 | 2.3242 | 95 | 2.5895 |
2.5648 | 2.4465 | 100 | 2.5661 |
2.5476 | 2.5688 | 105 | 2.5438 |
2.5116 | 2.6911 | 110 | 2.5228 |
2.4919 | 2.8135 | 115 | 2.5031 |
2.4722 | 2.9358 | 120 | 2.4846 |
2.4382 | 3.0581 | 125 | 2.4667 |
2.4433 | 3.1804 | 130 | 2.4536 |
2.4143 | 3.3028 | 135 | 2.4384 |
2.4077 | 3.4251 | 140 | 2.4241 |
2.3855 | 3.5474 | 145 | 2.4112 |
2.3611 | 3.6697 | 150 | 2.3989 |
2.351 | 3.7920 | 155 | 2.3876 |
2.3255 | 3.9144 | 160 | 2.3768 |
2.3224 | 4.0367 | 165 | 2.3677 |
2.3123 | 4.1590 | 170 | 2.3587 |
2.3075 | 4.2813 | 175 | 2.3503 |
2.2891 | 4.4037 | 180 | 2.3425 |
2.2675 | 4.5260 | 185 | 2.3353 |
2.279 | 4.6483 | 190 | 2.3282 |
2.2585 | 4.7706 | 195 | 2.3222 |
2.252 | 4.8930 | 200 | 2.3162 |
2.2497 | 5.0153 | 205 | 2.3108 |
2.2345 | 5.1376 | 210 | 2.3059 |
2.2238 | 5.2599 | 215 | 2.3011 |
2.2264 | 5.3823 | 220 | 2.2967 |
2.2188 | 5.5046 | 225 | 2.2930 |
2.206 | 5.6269 | 230 | 2.2890 |
2.1943 | 5.7492 | 235 | 2.2855 |
2.2052 | 5.8716 | 240 | 2.2822 |
2.1997 | 5.9939 | 245 | 2.2792 |
2.1836 | 6.1162 | 250 | 2.2769 |
2.1831 | 6.2385 | 255 | 2.2742 |
2.1596 | 6.3609 | 260 | 2.2719 |
2.1936 | 6.4832 | 265 | 2.2699 |
2.1768 | 6.6055 | 270 | 2.2677 |
2.1727 | 6.7278 | 275 | 2.2659 |
2.1649 | 6.8502 | 280 | 2.2641 |
2.1686 | 6.9725 | 285 | 2.2626 |
2.1868 | 7.0948 | 290 | 2.2614 |
2.1634 | 7.2171 | 295 | 2.2600 |
2.15 | 7.3394 | 300 | 2.2588 |
2.1495 | 7.4618 | 305 | 2.2579 |
2.1494 | 7.5841 | 310 | 2.2569 |
2.1463 | 7.7064 | 315 | 2.2560 |
2.1578 | 7.8287 | 320 | 2.2553 |
2.1386 | 7.9511 | 325 | 2.2547 |
2.1451 | 8.0734 | 330 | 2.2542 |
2.1499 | 8.1957 | 335 | 2.2537 |
2.1457 | 8.3180 | 340 | 2.2532 |
2.1454 | 8.4404 | 345 | 2.2528 |
2.135 | 8.5627 | 350 | 2.2525 |
2.1404 | 8.6850 | 355 | 2.2523 |
2.155 | 8.8073 | 360 | 2.2521 |
2.1435 | 8.9297 | 365 | 2.2519 |
2.1439 | 9.0520 | 370 | 2.2518 |
2.1343 | 9.1743 | 375 | 2.2517 |
2.1478 | 9.2966 | 380 | 2.2517 |
2.1428 | 9.4190 | 385 | 2.2516 |
2.1312 | 9.5413 | 390 | 2.2516 |
2.1518 | 9.6636 | 395 | 2.2516 |
2.1357 | 9.7859 | 400 | 2.2516 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 234
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.