lemexp-task1-template_small-Qwen2.5-1.5B-ddp-8lr

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0008
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 8
total_eval_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 12
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5907	0.2001	1258	0.5642
0.5363	0.4002	2516	0.5205
0.521	0.6003	3774	0.4889
0.5061	0.8004	5032	0.4806
0.4959	1.0005	6290	0.4624
0.4793	1.2006	7548	0.4559
0.4771	1.4007	8806	0.4519
0.4701	1.6008	10064	0.4431
0.4734	1.8009	11322	0.4480
0.4913	2.0010	12580	0.4428
0.4523	2.2010	13838	0.4328
0.4518	2.4011	15096	0.4327
0.4503	2.6012	16354	0.4332
0.4428	2.8013	17612	0.4187
0.444	3.0014	18870	0.4209
0.4286	3.2015	20128	0.4180
0.4326	3.4016	21386	0.4132
0.4311	3.6017	22644	0.4099
0.4254	3.8018	23902	0.4069
0.4231	4.0019	25160	0.3985
0.4166	4.2020	26418	0.3994
0.4127	4.4021	27676	0.3949
0.4063	4.6022	28934	0.3907
0.4089	4.8023	30192	0.3885
0.4041	5.0024	31450	0.3907
0.3926	5.2025	32708	0.3882
0.39	5.4026	33966	0.3843
0.3899	5.6027	35224	0.3794
0.3902	5.8028	36482	0.3769
0.3896	6.0029	37740	0.3720
0.3781	6.2030	38998	0.3735
0.3764	6.4031	40256	0.3683
0.3719	6.6031	41514	0.3692
0.3767	6.8032	42772	0.3648
0.3745	7.0033	44030	0.3624
0.3593	7.2034	45288	0.3636
0.3603	7.4035	46546	0.3555
0.3596	7.6036	47804	0.3522
0.3567	7.8037	49062	0.3541
0.3553	8.0038	50320	0.3514
0.3427	8.2039	51578	0.3451
0.3434	8.4040	52836	0.3480
0.3465	8.6041	54094	0.3443
0.3411	8.8042	55352	0.3435
0.3402	9.0043	56610	0.3422
0.3253	9.2044	57868	0.3404
0.3251	9.4045	59126	0.3361
0.3263	9.6046	60384	0.3355
0.3258	9.8047	61642	0.3321
0.3289	10.0048	62900	0.3315
0.3093	10.2049	64158	0.3345
0.3113	10.4050	65416	0.3326
0.3084	10.6051	66674	0.3299
0.3098	10.8052	67932	0.3277
0.3064	11.0052	69190	0.3266
0.2951	11.2053	70448	0.3289
0.2951	11.4054	71706	0.3259
0.2939	11.6055	72964	0.3255
0.2923	11.8056	74222	0.3249