vit_focus_full

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 30

Training Loss	Epoch	Step	Validation Loss	Mse	Mae
0.3146	0.9855	51	0.0595	0.1403	0.3265
0.2488	1.9855	102	0.0566	0.1395	0.3253
0.2278	2.9855	153	0.0611	0.1426	0.3288
0.206	3.9855	204	0.0536	0.1323	0.3180
0.1902	4.9855	255	0.0619	0.1411	0.3271
0.187	5.9855	306	0.0508	0.1320	0.3169
0.1757	6.9855	357	0.0537	0.1339	0.3183
0.1523	7.9855	408	0.0558	0.1330	0.3168
0.1528	8.9855	459	0.0591	0.1381	0.3225
0.1416	9.9855	510	0.0536	0.1353	0.3198
0.1298	10.9855	561	0.0530	0.1325	0.3164
0.1161	11.9855	612	0.0511	0.1315	0.3156
0.1085	12.9855	663	0.0531	0.1385	0.3243
0.1028	13.9855	714	0.0530	0.1316	0.3151
0.0891	14.9855	765	0.0540	0.1338	0.3178
0.0878	15.9855	816	0.0536	0.1335	0.3177
0.077	16.9855	867	0.0534	0.1299	0.3132
0.0769	17.9855	918	0.0549	0.1313	0.3149
0.0663	18.9855	969	0.0531	0.1291	0.3119
0.064	19.9855	1020	0.0540	0.1352	0.3197
0.0608	20.9855	1071	0.0535	0.1334	0.3179
0.0548	21.9855	1122	0.0529	0.1299	0.3134
0.0517	22.9855	1173	0.0534	0.1310	0.3152
0.0498	23.9855	1224	0.0544	0.1314	0.3151
0.047	24.9855	1275	0.0531	0.1309	0.3145
0.0443	25.9855	1326	0.0537	0.1325	0.3164
0.042	26.9855	1377	0.0533	0.1319	0.3156
0.0397	27.9855	1428	0.0530	0.1317	0.3155
0.0411	28.9855	1479	0.0542	0.1328	0.3167
0.0382	29.9855	1530	0.0533	0.1327	0.3166

Safetensors

Model size

24.3M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support