𧬠ViDRiP-LLaVA: Multimodal Diagnostic Reasoning in Pathology
ViDRiP-LLaVA is a vision-language framework designed for instruction-based diagnostic reasoning using both image patches and video clips from pathology slides. It builds on LLaVA and extends it to the medical domain with domain-specific datasets and fine-tuned models.
π§ Introducing our ViDRiP-LLaVA: the first multimodal model for diagnostic reasoning in pathology through video-based instruction. π¬π½οΈ
Our method leverages chain-of-thought (CoT) prompting to distill the reasoning capabilities of LLMs. ViDRiP-LLaVA generates both detailed histological descriptions and final diagnoses, simulating how pathologists analyze and sign out cases.
π Trained on 4,278 instructional video pairs
βοΈ Combines single-image + clip transfer and fine-tuning on segmented diagnostic videos
π Datasets
πΉ ViDRiP_Instruct_Train
πΉ ViDRiP_Instruct_Train_Video
- 4,000+ instruction-style samples
- Each sample includes:
- A pathology video clip
- A diagnostic question
- A multi-turn reasoning answer
- Format: JSON + MP4
- Croissant-compliant metadata for structured use
πΉ ViDRiP_Instruct_Test
πΉ ViDRiP_Instruct_Test_Video
- Held-out test set of diagnostic Q&A pairs
- Used for benchmarking reasoning performance
π€ Models
πΈ ViDRiP_LLaVA_video
- Vision-language model for video-based diagnostic reasoning
- Trained on
ViDRiP_Instruct_Train
- Suitable for:
- Medical VQA
- Instructional explanation generation
- Educational pathology summarization
πΈ ViDRiP_LLaVA_image
- Vision-language model for patch-based diagnostic prompts
- Useful for pathology captioning and single-frame inference
π Quickstart
π§ Fine-tuning the model on video dataset
./scripts/train/finetune_ov_video.sh
πͺ Fine-tuning with LoRA
./scripts/train/finetune_ov_video_lora.sh
π Merge LoRA weights
./scripts/train/merge_lora_weights.py
π§ͺ Usage / Demo
./doc/ViDRiP_LLaVA_trial.py
π§ Evaluate on our video dataset
We use lmms_eval to evaluate the performance of video diagnostic reasoning.
To benchmark ViDRiP-LLaVA
and compare it with other models:
- Clone the
lmms_eval
repo - Copy our evaluation task folder into it:
cp -r lmms_eval/tasks/ViDRiP_Instruct_Test /path/to/lmms_eval/tasks/
You can then run evaluation using the standard lmms_eval CLI interface.
Citation:
Coming soon