long_first_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.3012	0.9997	1789	4.8455	0.2361
4.3382	1.9995	3578	4.4897	0.2574
3.9213	2.9999	5368	4.2905	0.2726
3.6894	3.9997	7157	4.1901	0.2807
3.5972	4.9994	8946	4.1275	0.2875
3.4782	5.9998	10736	4.1143	0.2896
3.408	6.9996	12525	4.0568	0.2945
3.3615	7.9999	14315	4.0610	0.2923
3.3271	8.9997	16104	4.0475	0.2953
3.2786	9.9995	17893	4.0470	0.2945
3.2554	10.9999	19683	4.0410	0.2956
3.2438	11.9997	21472	4.0579	0.2945
3.2387	12.9994	23261	4.0322	0.2972
3.2326	13.9998	25051	4.0446	0.2958
3.1863	14.9996	26840	4.0195	0.2968
3.1876	15.9999	28630	4.0171	0.2974
3.1947	16.9997	30419	4.0327	0.2962
3.1998	17.9995	32208	4.0206	0.2981
3.1444	18.9999	33998	3.9929	0.3008
2.9822	19.9957	35780	3.9978	0.3029