qwen2.5-0.5b-sft3-25-1

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new_full2 dataset. It achieves the following results on the evaluation set:

Loss: 2.2516

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 10
eval_batch_size: 10
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 8
total_train_batch_size: 160
total_eval_batch_size: 20
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.2296	0.1223	5	3.2136
3.225	0.2446	10	3.2068
3.2177	0.3670	15	3.1877
3.1918	0.4893	20	3.1448
3.1562	0.6116	25	3.1100
3.0931	0.7339	30	3.0523
3.053	0.8563	35	3.0036
2.9924	0.9786	40	2.9515
2.9383	1.1009	45	2.8940
2.8913	1.2232	50	2.8538
2.855	1.3456	55	2.8187
2.8094	1.4679	60	2.7831
2.7692	1.5902	65	2.7499
2.7438	1.7125	70	2.7179
2.7143	1.8349	75	2.6884
2.674	1.9572	80	2.6618
2.6404	2.0795	85	2.6370
2.6205	2.2018	90	2.6131
2.6003	2.3242	95	2.5895
2.5648	2.4465	100	2.5661
2.5476	2.5688	105	2.5438
2.5116	2.6911	110	2.5228
2.4919	2.8135	115	2.5031
2.4722	2.9358	120	2.4846
2.4382	3.0581	125	2.4667
2.4433	3.1804	130	2.4536
2.4143	3.3028	135	2.4384
2.4077	3.4251	140	2.4241
2.3855	3.5474	145	2.4112
2.3611	3.6697	150	2.3989
2.351	3.7920	155	2.3876
2.3255	3.9144	160	2.3768
2.3224	4.0367	165	2.3677
2.3123	4.1590	170	2.3587
2.3075	4.2813	175	2.3503
2.2891	4.4037	180	2.3425
2.2675	4.5260	185	2.3353
2.279	4.6483	190	2.3282
2.2585	4.7706	195	2.3222
2.252	4.8930	200	2.3162
2.2497	5.0153	205	2.3108
2.2345	5.1376	210	2.3059
2.2238	5.2599	215	2.3011
2.2264	5.3823	220	2.2967
2.2188	5.5046	225	2.2930
2.206	5.6269	230	2.2890
2.1943	5.7492	235	2.2855
2.2052	5.8716	240	2.2822
2.1997	5.9939	245	2.2792
2.1836	6.1162	250	2.2769
2.1831	6.2385	255	2.2742
2.1596	6.3609	260	2.2719
2.1936	6.4832	265	2.2699
2.1768	6.6055	270	2.2677
2.1727	6.7278	275	2.2659
2.1649	6.8502	280	2.2641
2.1686	6.9725	285	2.2626
2.1868	7.0948	290	2.2614
2.1634	7.2171	295	2.2600
2.15	7.3394	300	2.2588
2.1495	7.4618	305	2.2579
2.1494	7.5841	310	2.2569
2.1463	7.7064	315	2.2560
2.1578	7.8287	320	2.2553
2.1386	7.9511	325	2.2547
2.1451	8.0734	330	2.2542
2.1499	8.1957	335	2.2537
2.1457	8.3180	340	2.2532
2.1454	8.4404	345	2.2528
2.135	8.5627	350	2.2525
2.1404	8.6850	355	2.2523
2.155	8.8073	360	2.2521
2.1435	8.9297	365	2.2519
2.1439	9.0520	370	2.2518
2.1343	9.1743	375	2.2517
2.1478	9.2966	380	2.2517
2.1428	9.4190	385	2.2516
2.1312	9.5413	390	2.2516
2.1518	9.6636	395	2.2516
2.1357	9.7859	400	2.2516

Framework versions

Transformers 4.42.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-sft3-25-1

qwen2.5-0.5b-sft3-25-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-sft3-25-1

Dataset used to train hZzy/qwen2.5-0.5b-sft3-25-1

Evaluation results