Finetuning PaliGemma with AutoTrain
In this blog, we will see how we can finetune PaliGemma using AutoTrain for VQA and captioning tasks.
AutoTrain is a no-code solution designed to make life easier for data scientists, machine learning engineers and enthusiasts. It allows you to train (almost) any state of the art model without writing a single line of code. To get started with AutoTrain, check out the docs and Github repo.
Dataset
You can use a dataset from the hub or a local dataset.
Hub Dataset
A hub dataset should be in the following format:
The columns of interest are:
- image: the image (
image
) - question: the question (
prompt_text_column
) - multiple_choice_answer: the answer (
text_column
)
Note: we use the above three columns for VQA. For captioning task, we use only the image
and text_column
.
Local Dataset
If using a dataset locally, it should be formatted like this:
train/
βββ 0001.png
βββ 0002.png
βββ 0003.png
βββ .
βββ .
βββ .
βββ metadata.jsonl
where metadata.jsonl looks like the following:
{"file_name": "0001.jpg", "question": "What vehicles are shown?", "multiple_choice_answer": "motorcycles"}
{"file_name": "0002.jpg", "question": "Is the plane upside down?", "multiple_choice_answer": "no"}
{"file_name": "0003.jpg", "question": "What is the boy doing?", "multiple_choice_answer": "batting"}
the metadata.jsonl
must have a file_name
column you can change the other column names.
If you have validation data, you can add a folder in the same format as above.
NOTE: When using the AutoTrain UI, the folders need to be compressed as ZIP files. When train.zip is expanded, it should have all the images and metadata.jsonl, no folders, no subfolders.
Training Locally
Locally, autotrain can be used both in UI mode or CLI mode.
To install autotrain, use the pip command:
$ pip install -U autotrain-advanced
Once done, you can start the UI using the command:
$ autotrain app
Training using CLI/config
To train using a config file, create a config.yml
that looks like the following:
task: vlm:vqa
base_model: google/paligemma-3b-pt-224
project_name: autotrain-paligemma-finetuned-vqa
log: tensorboard
backend: local
data:
path: abhishek/vqa_small
train_split: train
valid_split: validation
column_mapping:
image_column: image
text_column: multiple_choice_answer
prompt_text_column: question
params:
epochs: 3
batch_size: 2
lr: 2e-5
optimizer: adamw_torch
scheduler: linear
gradient_accumulation: 4
mixed_precision: fp16
peft: true
quantization: int4
hub:
username: ${HF_USERNAME}
token: ${HF_TOKEN}
push_to_hub: true
The above config uses dataset from hub, if using a local dataset, change the following:
data:
path: local_dataset_folder_path # where training and validation (optional) folders are
train_split: train # name of training folder
valid_split: validation # name of validation folder or none
column_mapping:
image_column: image
text_column: multiple_choice_answer
prompt_text_column: question
Please double check the column mappings!
Once done, run:
$ export HF_USERNAME=your_hugging_face_username
$ export HF_TOKEN=your_hugging_face_write_token
$ autotrain --config path_to_config.yml
And wait and watch the training progress :)
Training using UI
Here's a screenshot of the UI with HuggingFace Hub dataset:
And one with local dataset:
Again, take special care of column mappings ;)
Finally, your model can be pushed to hub (your choice) and will be available for use. In case of any issues, use github issue tracker here.
Happy AutoTraining! π€