🧬 ViDRiP-LLaVA: Multimodal Diagnostic Reasoning in Pathology

ViDRiP-LLaVA is a vision-language framework designed for instruction-based diagnostic reasoning using both image patches and video clips from pathology slides. It builds on LLaVA and extends it to the medical domain with domain-specific datasets and fine-tuned models.

🧠 Introducing our ViDRiP-LLaVA: the first multimodal model for diagnostic reasoning in pathology through video-based instruction. 🔬📽️

Our method leverages chain-of-thought (CoT) prompting to distill the reasoning capabilities of LLMs. ViDRiP-LLaVA generates both detailed histological descriptions and final diagnoses, simulating how pathologists analyze and sign out cases.

📚 Trained on 4,278 instructional video pairs

⚙️ Combines single-image + clip transfer and fine-tuning on segmented diagnostic videos

📚 Datasets

🔹 ViDRiP_Instruct_Train

🔹 ViDRiP_Instruct_Train_Video

4,000+ instruction-style samples
Each sample includes:
- A pathology video clip
- A diagnostic question
- A multi-turn reasoning answer
Format: JSON + MP4
Croissant-compliant metadata for structured use

🔹 ViDRiP_Instruct_Test

🔹 ViDRiP_Instruct_Test_Video

Held-out test set of diagnostic Q&A pairs
Used for benchmarking reasoning performance

🤖 Models

🔸 ViDRiP_LLaVA_video

Vision-language model for video-based diagnostic reasoning
Trained on ViDRiP_Instruct_Train
Suitable for:
- Medical VQA
- Instructional explanation generation
- Educational pathology summarization

🔸 ViDRiP_LLaVA_image

Vision-language model for patch-based diagnostic prompts
Useful for pathology captioning and single-frame inference

🚀 Quickstart

🔧 Fine-tuning the model on video dataset

./scripts/train/finetune_ov_video.sh

🪄 Fine-tuning with LoRA

./scripts/train/finetune_ov_video_lora.sh

🔗 Merge LoRA weights

./scripts/train/merge_lora_weights.py

🧪 Usage / Demo

./doc/ViDRiP_LLaVA_trial.py

🔧 Evaluate on our video dataset

We use lmms_eval to evaluate the performance of video diagnostic reasoning.

To benchmark ViDRiP-LLaVA and compare it with other models:

Clone the lmms_eval repo
Copy our evaluation task folder into it:

cp -r lmms_eval/tasks/ViDRiP_Instruct_Test /path/to/lmms_eval/tasks/

You can then run evaluation using the standard lmms_eval CLI interface.

Citation:

Coming soon