trinhvg
/

ViDRiP_LLaVA_image

Safetensors

llava_qwen

Model card Files Files and versions Community

trinhvg commited on May 21

Commit

6da947e

verified ·

1 Parent(s): 43f5306

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +22 -8

README.md CHANGED Viewed

@@ -1,8 +1,3 @@
----
-datasets:
-- trinhvg/ViDRiP_Instruct_Test
-- trinhvg/ViDRiP_Instruct_Train
----
 # 🧬 ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos
@@ -26,11 +21,28 @@ Our method leverages chain-of-thought (CoT) prompting to distill the reasoning c
 ## 📚 Video Datasets
 ### 🔹 [ViDRiP_Instruct_Train](https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train)
-The videos data is ~ 100 GB:
 [//]: # (### 🔹 [ViDRiP_Instruct_Train_Video_GoogleDrive]&#40;https://drive.google.com/drive/folders/1oxZlaJpE7PGDYt32LeoGgIzwEvWdnupY?usp=sharing&#41;)
-### 🔹 [ViDRiP_Instruct_Train_Video_Hugging Face](https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train) (There is 10 zip files)
 - 4,000+ instruction-style samples
 - Each sample includes:
@@ -46,6 +58,8 @@ The videos data is ~ 100 GB:
 - Held-out test set of diagnostic Q&A pairs
 - Used for benchmarking reasoning performance
 ## 📚 Image Datasets
 We use publicly available datasets: Quilt-LLaVA and PathAsst.
 Please refer to their respective repositories for download instructions.
@@ -139,4 +153,4 @@ All content usage complies with [**YouTube’s Terms of Service**](https://www.y
 * Not for **commercial use**
 * Not to be used in **clinical care** or **medical decision-making**
-* For **academic research, development, and evaluation only**

 # 🧬 ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos
 ## 📚 Video Datasets
+### 🎥 Released Video Format
+All clips are:
+- **Cleaned** using a Visual Data Refinement pipeline (temporal trimming + YoloPath filtering + OCR exclusion + inpainting)
+- **Downsampled** to **1–5 FPS** to reduce file size and support fair-use compliance
+- **Muted** to avoid redistribution of original YouTube audio
+These steps preserve diagnostic signal while respecting the rights of YouTube creators and complying with [YouTube’s Terms of Service](https://www.youtube.com/t/terms).
+### 🔍 Training vs. Public Release Notice
+The ViDRiP-LLaVA models were trained on an internal dataset version that included:
+- Full-frame-rate video clips
+- Visual content **prior to OCR filtering**
+All **evaluations** (including those in our benchmark) are conducted using the **publicly released test set**, ensuring full reproducibility.
 ### 🔹 [ViDRiP_Instruct_Train](https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train)
+The videos data is ~ 60 GB:
 [//]: # (### 🔹 [ViDRiP_Instruct_Train_Video_GoogleDrive]&#40;https://drive.google.com/drive/folders/1oxZlaJpE7PGDYt32LeoGgIzwEvWdnupY?usp=sharing&#41;)
+### 🔹 [ViDRiP_Instruct_Train_Video_Hugging Face](https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train) (There is 6 zip files)
 - 4,000+ instruction-style samples
 - Each sample includes:
 - Held-out test set of diagnostic Q&A pairs
 - Used for benchmarking reasoning performance
 ## 📚 Image Datasets
 We use publicly available datasets: Quilt-LLaVA and PathAsst.
 Please refer to their respective repositories for download instructions.
 * Not for **commercial use**
 * Not to be used in **clinical care** or **medical decision-making**
+* For **academic research, development, and evaluation only**