pali_191805

This model is a fine-tuned version of google/paligemma-3b-pt-224 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss
18.2133	0.0444	50	1.7766
11.9694	0.0889	100	1.3043
9.7625	0.1333	150	1.1940
9.0576	0.1778	200	1.1325
9.3286	0.2222	250	1.0906
8.5435	0.2667	300	1.0586
8.2508	0.3111	350	1.0357
8.3642	0.3556	400	1.0151
8.0343	0.4	450	0.9982
8.1537	0.4444	500	0.9818
7.6705	0.4889	550	0.9672
7.6794	0.5333	600	0.9557
7.3842	0.5778	650	0.9470
7.5392	0.6222	700	0.9343
7.3926	0.6667	750	0.9233
7.5391	0.7111	800	0.9141
7.3299	0.7556	850	0.9053
7.3423	0.8	900	0.8974
7.4747	0.8444	950	0.8911
7.252	0.8889	1000	0.8832
7.1392	0.9333	1050	0.8783
6.9769	0.9778	1100	0.8719
7.0285	1.0222	1150	0.8665
6.8336	1.0667	1200	0.8613
6.748	1.1111	1250	0.8563