Configuration Parsing
Warning:
In adapter_config.json: "peft.base_model_name_or_path" must be a string
Qwen2.5-VL-3B-Instruct - MIMIC-CXR Fine-tuned
This repository contains a LoRA fine-tuned adapter for Qwen/Qwen2.5-VL-3B-Instruct, trained on the MIMIC-CXR dataset.
The goal is to adapt a powerful multimodal vision-language model for medical chest X-ray interpretation, generating clinical-style reports from chest radiographs.
How to Use
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info
import torch
base_model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter_id = "onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr"
# Load base model
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
# Processor
processor = AutoProcessor.from_pretrained(base_model_id)
# Example inference
def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"):
text_input = processor.apply_chat_template(
sample[:1], tokenize=False, add_generation_prompt=True
)
image_inputs, _ = process_vision_info(sample)
model_inputs = processor(
text=[text_input],
images=image_inputs,
return_tensors="pt",
).to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(
trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text[0]
sample = [
{'role': 'user',
'content': [{'type': 'image',
'image': "./chest_xray.jpg"},
{'type': 'text',
'text': 'Please analyze this chest X-ray and provide the findings and impression.'}]},
]
output = generate_text_from_sample(model, processor, sample)
print(output)
Model Details
- Base model: Qwen/Qwen2.5-VL-3B-Instruct
- Adapter type: LoRA (PEFT)
- Training objective: Supervised fine-tuning (SFT) on chest X-ray reports
- Dataset: MIMIC-CXR (radiology images + reports)
- Languages: English (medical reporting domain)
- Frameworks:
transformers
,peft
,trl
Intended Uses
Direct Use
- Generating radiology-style reports from chest X-ray images.
- Research on applying large multimodal models to medical imaging tasks.
Downstream Use
- Medical text generation tasks where radiological image context is available.
- Adaptation for other healthcare VQA (Visual Question Answering) tasks.
Out-of-Scope Use
โ ๏ธ Not for clinical decision-making.
This model is intended for research purposes only. Do not use it in medical practice without proper validation and regulatory approval.
- Downloads last month
- 22
Model tree for onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr
Base model
Qwen/Qwen2.5-VL-3B-Instruct