short_first_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.1631	0.9997	1789	4.4155	0.2878
4.1584	1.9995	3578	3.9416	0.3233
3.7403	2.9999	5368	3.6939	0.3450
3.5113	3.9997	7157	3.5700	0.3573
3.4222	4.9994	8946	3.4928	0.3641
3.3102	5.9998	10736	3.4535	0.3679
3.2452	6.9996	12525	3.4313	0.3698
3.2019	7.9999	14315	3.4132	0.3718
3.171	8.9997	16104	3.3968	0.3739
3.1243	9.9995	17893	3.3861	0.3748
3.1023	10.9999	19683	3.3813	0.3758
3.0926	11.9997	21472	3.3752	0.3767
3.0887	12.9994	23261	3.3672	0.3776
3.0837	13.9998	25051	3.3726	0.3764
3.0384	14.9996	26840	3.3657	0.3774
3.0408	15.9999	28630	3.3631	0.3776
3.0488	16.9997	30419	3.3656	0.3779
3.0537	17.9995	32208	3.3497	0.3797
3.0	18.9999	33998	3.3133	0.3841
2.8427	19.9957	35780	3.3005	0.3879