metadata
license: apache-2.0
tags:
- blip
- image-captioning
- vision-language
- transformers
- fine-tuned
- pytorch
language:
- en
base_model:
- Salesforce/blip-image-captioning-base
library_name: transformers
pipeline_tag: image-to-text
BLIP model fine-tuned on histopathology images
This model is a fine-tuned version of Salesforce/blip-image-captioning-base on a histopathology image dataset with the average loss of 0.0098
Model description
The model was fine-tuned on the histopathology-image-caption-dataset for automatic captioning of histopathology images.
Training procedure
The model was trained for 10 epochs with a batch size of 4 and a learning rate of 5e-5. Images were processed using the BLIP processor and gradient accumulation steps of 2 were used.
Usage for further fine-tuning
The last checkpoint is included in this repository under the 'last_checkpoint' directory. You can use this checkpoint to continue fine-tuning on another dataset.
Training details
- Dataset: Histopathology Image Caption Dataset (Kaggle)
- Base model: Salesforce/blip-image-captioning-base
- Training epochs: 10
- Batch size: 4
- Learning rate: 5e-5
- Gradient accumulation steps: 2
- Device: CUDA (if available)
Usage for inference
from transformers import AutoProcessor, BlipForConditionalGeneration
from PIL import Image
# Load model and processor
model = BlipForConditionalGeneration.from_pretrained("ragunath-ravi/blip-histopathology-finetuned")
processor = AutoProcessor.from_pretrained("ragunath-ravi/blip-histopathology-finetuned")
# Load image
image = Image.open("path_to_histopathology_image.jpg").convert('RGB')
# Process image
inputs = processor(images=image, return_tensors="pt")
pixel_values = inputs.pixel_values
# Generate caption
generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_caption)