metadata

license: apache-2.0
tags:
  - blip
  - image-captioning
  - vision-language
  - transformers
  - fine-tuned
  - pytorch
language:
  - en
base_model:
  - Salesforce/blip-image-captioning-base
library_name: transformers
pipeline_tag: image-to-text

BLIP model fine-tuned on histopathology images

This model is a fine-tuned version of Salesforce/blip-image-captioning-base on a histopathology image dataset with the average loss of 0.0098

Model description

The model was fine-tuned on the histopathology-image-caption-dataset for automatic captioning of histopathology images.

Training procedure

The model was trained for 10 epochs with a batch size of 4 and a learning rate of 5e-5. Images were processed using the BLIP processor and gradient accumulation steps of 2 were used.

Usage for further fine-tuning

The last checkpoint is included in this repository under the 'last_checkpoint' directory. You can use this checkpoint to continue fine-tuning on another dataset.

Training details

Dataset: Histopathology Image Caption Dataset (Kaggle)
Base model: Salesforce/blip-image-captioning-base
Training epochs: 10
Batch size: 4
Learning rate: 5e-5
Gradient accumulation steps: 2
Device: CUDA (if available)

Usage for inference

from transformers import AutoProcessor, BlipForConditionalGeneration
from PIL import Image

# Load model and processor
model = BlipForConditionalGeneration.from_pretrained("ragunath-ravi/blip-histopathology-finetuned")
processor = AutoProcessor.from_pretrained("ragunath-ravi/blip-histopathology-finetuned")

# Load image
image = Image.open("path_to_histopathology_image.jpg").convert('RGB')

# Process image
inputs = processor(images=image, return_tensors="pt")
pixel_values = inputs.pixel_values

# Generate caption
generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_caption)