paligemma_Malaysian_plate_recognition
This model is a fine-tuned version of google/paligemma-3b-pt-224 on the Malaysian license plate dataset.
from PIL import Image
import torch
from transformers import PaliGemmaProcessor, PaliGemmaForConditionalGeneration, BitsAndBytesConfig, TrainingArguments, Trainer
import time
model = PaliGemmaForConditionalGeneration.from_pretrained('NYUAD-ComNets/VehiclePaliGemma',torch_dtype=torch.bfloat16)
input_text ="extract the text from the image"
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
input_image = Image.open(image_path)
inputs = processor(text=input_text, images=input_image, padding="longest", do_convert_rgb=True, return_tensors="pt").to(device)
inputs = inputs.to(dtype=model.dtype)
with torch.no_grad():
output = model.generate(**inputs, max_length=500)
result=processor.decode(output[0], skip_special_tokens=True)[len(input_text):].strip()
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- num_epochs: 5
Framework versions
- Transformers 4.42.4
- Pytorch 2.1.2+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
BibTeX entry and citation info
@misc{aldahoul2024advancingvehicleplaterecognition,
title={Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma},
author={Nouar AlDahoul and Myles Joshua Toledo Tan and Raghava Reddy Tera and Hezerul Abdul Karim and Chee How Lim and Manish Kumar Mishra and Yasir Zaki},
year={2024},
eprint={2412.14197},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.14197},
}
- Downloads last month
- 92
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for NYUAD-ComNets/VehiclePaliGemma
Base model
google/paligemma-3b-pt-224