Visualize in Weights & Biases

qwen2.5-0.5b-sft3-25-1

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new_full2 dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2516

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 10
  • eval_batch_size: 10
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 160
  • total_eval_batch_size: 20
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.2296 0.1223 5 3.2136
3.225 0.2446 10 3.2068
3.2177 0.3670 15 3.1877
3.1918 0.4893 20 3.1448
3.1562 0.6116 25 3.1100
3.0931 0.7339 30 3.0523
3.053 0.8563 35 3.0036
2.9924 0.9786 40 2.9515
2.9383 1.1009 45 2.8940
2.8913 1.2232 50 2.8538
2.855 1.3456 55 2.8187
2.8094 1.4679 60 2.7831
2.7692 1.5902 65 2.7499
2.7438 1.7125 70 2.7179
2.7143 1.8349 75 2.6884
2.674 1.9572 80 2.6618
2.6404 2.0795 85 2.6370
2.6205 2.2018 90 2.6131
2.6003 2.3242 95 2.5895
2.5648 2.4465 100 2.5661
2.5476 2.5688 105 2.5438
2.5116 2.6911 110 2.5228
2.4919 2.8135 115 2.5031
2.4722 2.9358 120 2.4846
2.4382 3.0581 125 2.4667
2.4433 3.1804 130 2.4536
2.4143 3.3028 135 2.4384
2.4077 3.4251 140 2.4241
2.3855 3.5474 145 2.4112
2.3611 3.6697 150 2.3989
2.351 3.7920 155 2.3876
2.3255 3.9144 160 2.3768
2.3224 4.0367 165 2.3677
2.3123 4.1590 170 2.3587
2.3075 4.2813 175 2.3503
2.2891 4.4037 180 2.3425
2.2675 4.5260 185 2.3353
2.279 4.6483 190 2.3282
2.2585 4.7706 195 2.3222
2.252 4.8930 200 2.3162
2.2497 5.0153 205 2.3108
2.2345 5.1376 210 2.3059
2.2238 5.2599 215 2.3011
2.2264 5.3823 220 2.2967
2.2188 5.5046 225 2.2930
2.206 5.6269 230 2.2890
2.1943 5.7492 235 2.2855
2.2052 5.8716 240 2.2822
2.1997 5.9939 245 2.2792
2.1836 6.1162 250 2.2769
2.1831 6.2385 255 2.2742
2.1596 6.3609 260 2.2719
2.1936 6.4832 265 2.2699
2.1768 6.6055 270 2.2677
2.1727 6.7278 275 2.2659
2.1649 6.8502 280 2.2641
2.1686 6.9725 285 2.2626
2.1868 7.0948 290 2.2614
2.1634 7.2171 295 2.2600
2.15 7.3394 300 2.2588
2.1495 7.4618 305 2.2579
2.1494 7.5841 310 2.2569
2.1463 7.7064 315 2.2560
2.1578 7.8287 320 2.2553
2.1386 7.9511 325 2.2547
2.1451 8.0734 330 2.2542
2.1499 8.1957 335 2.2537
2.1457 8.3180 340 2.2532
2.1454 8.4404 345 2.2528
2.135 8.5627 350 2.2525
2.1404 8.6850 355 2.2523
2.155 8.8073 360 2.2521
2.1435 8.9297 365 2.2519
2.1439 9.0520 370 2.2518
2.1343 9.1743 375 2.2517
2.1478 9.2966 380 2.2517
2.1428 9.4190 385 2.2516
2.1312 9.5413 390 2.2516
2.1518 9.6636 395 2.2516
2.1357 9.7859 400 2.2516

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
234
Safetensors
Model size
494M params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hZzy/qwen2.5-0.5b-sft3-25-1

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(181)
this model
Finetunes
7 models

Dataset used to train hZzy/qwen2.5-0.5b-sft3-25-1