gating_network_qwen_1.5

This model is a fine-tuned version of Qwen/Qwen1.5-1.8B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.311	0.0252	500	0.2788	0.9236	0.9216
0.2235	0.0503	1000	0.1763	0.9604	0.9591
0.1546	0.0755	1500	0.1805	0.9694	0.9692
0.1278	0.1006	2000	0.1261	0.9784	0.9779
0.0893	0.1258	2500	0.1286	0.9784	0.9788
0.0652	0.1510	3000	0.1357	0.9793	0.9787
0.0706	0.1761	3500	0.0899	0.9865	0.9864
0.0551	0.2013	4000	0.1000	0.9856	0.9849
0.0508	0.2264	4500	0.0662	0.9865	0.9859
0.0757	0.2516	5000	0.0883	0.9847	0.9840
0.0611	0.2768	5500	0.1417	0.9802	0.9797
0.0432	0.3019	6000	0.0545	0.9883	0.9876
0.0459	0.3271	6500	0.0732	0.9874	0.9870
0.0597	0.3522	7000	0.0711	0.9883	0.9883
0.0367	0.3774	7500	0.0742	0.9883	0.9884