You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

PaliGemma2 for Medical Image Parsing

This repository provides a PaliGemma2 model fine-tuned for comprehensive medical image question answering and analysis. The model is based on google/paligemma2-10b-pt-224 and was trained on the FLARE 2025 medical multimodal dataset, which includes 19 medical imaging datasets, 50,996 images, and 58,112 question-answer pairs across 8 imaging modalities.

Dataset Summary

  • Total datasets: 19
  • Total images: 50,996
  • Total questions: 58,112
  • Modalities: Clinical, Dermatology, Endoscopy, Mammography, Microscopy, Retinography, Ultrasound, Xray
  • Task types: Classification, Counting, Detection, Multi-label Classification, Regression, Report_Generation, Instance Detection

Training Details

  • Model: PaliGemma2-10B (LoRA fine-tuned, 4-bit quantization)
  • Training epochs: 8
  • Per-device batch size: 1
  • Gradient accumulation steps: 32
  • Effective batch size: 32 (accumulation) x 1 (device) = 32
  • Optimizer: Paged AdamW 8-bit
  • Learning rate: 5e-5
  • Warmup ratio: 0.03
  • Max grad norm: 1.0
  • LoRA configuration: r=16, alpha=32, dropout=0.05
  • Target modules: q_proj, o_proj, k_proj, v_proj, gate_proj, up_proj, down_proj

Model Performance

Task Metric (Description) Value #Examples
classification balanced accuracy 0.4723 3513
multi-label classification F1 score (micro) 0.5040 1446
detection F1 score (IoU>0.5) 0.3446 255
instance_detection F1 score (IoU>0.5) 0.0028 176
counting mean absolute error 295.6500 100
regression mean absolute error 16.5035 100
report_generation GREEN score 0.7072 1945

Usage

from transformers import PaliGemmaProcessor, PaliGemmaForConditionalGeneration
from peft import PeftModel
from PIL import Image
import torch

base_model_id = "google/paligemma2-10b-pt-224"
model_id = "yws0322/flare25-paligemma2"

processor = PaliGemmaProcessor.from_pretrained(base_model_id)
base_model = PaliGemmaForConditionalGeneration.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    load_in_4bit=True
)
model = PeftModel.from_pretrained(base_model, model_id)

image = Image.open("chest_xray.jpg")
question = "What are the key findings in this chest X-ray?"
image_token = "<image>"
prompt = f"{image_token * processor.image_seq_length}{processor.tokenizer.bos_token}Analyze the given medical image and answer the following question:\nQuestion: {question}\nPlease provide a clear and concise answer."
inputs = processor(images=image, text=prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Related Resources

Citation

If you use this model in your research, please cite:

@misc{flare25paligemma2025,
  title={FLARE25-PaliGemma2},
  author={Yeonwoo Seo},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/yws0322/flare25-paligemma2}
}

@misc{paligemma2-base,
  title={PaliGemma2: Multimodal Vision-Language Model by Google Research},
  author={Google Research},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/google/paligemma2-10b-pt-224}
}

Model uploaded on 2025-06-03

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yws0322/flare25-paligemma2

Adapter
(5)
this model

Dataset used to train yws0322/flare25-paligemma2