paligemmafinetune3mix-448-modelSinDesbalance

This model is a fine-tuned version of google/paligemma-3b-mix-448 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 15

Training Loss	Epoch	Step	Validation Loss
20.6012	0.9863	36	3.8261
13.5869	1.9863	72	2.4382
10.1392	2.9863	108	1.8341
8.3163	3.9863	144	1.4852
7.243	4.9863	180	1.2853
6.2677	5.9863	216	1.1375
5.5148	6.9863	252	1.0217
4.822	7.9863	288	0.9526
4.1282	8.9863	324	0.8958
3.5611	9.9863	360	0.8489
3.0227	10.9863	396	0.8394
2.6123	11.9863	432	0.8216
2.2456	12.9863	468	0.8221
2.0319	13.9863	504	0.8311
1.8671	14.9863	540	0.8331