OpenFace-CQUPT commited on
Commit
d334b38
·
verified ·
1 Parent(s): 3eb8cc5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -4,7 +4,7 @@ language:
4
  - zh
5
  - en
6
  ---
7
- # PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding
8
 
9
  We developed a domain-speciffc large language-vision assistant (PA-LLaVA) for pathology image understanding. Specifically, (1) we first construct a human pathology image-text dataset by cleaning the public medical image-text data for domainspecific alignment; (2) Using the proposed image-text data, we first train a pathology language-image pretraining (PLIP) model as the specialized visual encoder for pathology image, and then we developed scale-invariant connector to avoid the information loss caused by image scaling; (3) We adopt two-stage learning to train PA-LLaVA, first stage for domain alignment, and second stage for end to end visual question & answering (VQA) task.
10
 
@@ -16,10 +16,18 @@ We developed a domain-speciffc large language-vision assistant (PA-LLaVA) for pa
16
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/SkFB0x3JunWE_Wae808Nq.png)
17
 
18
 
19
- ## Data Cleaning Process
 
 
 
 
 
 
 
20
 
21
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/IAeFWhH8brZYDaTJnew2N.png)
22
 
 
23
 
24
  ### Step 1 Download the public datasets.
25
  Here we only provide the download link for the public dataset and expose the image id index of our cleaned dataset on HuggingFace.
@@ -65,7 +73,7 @@ Finally, run dataformate.py to get the format needed to train the model.
65
  python dataformat.py
66
  ```
67
 
68
- ## Model
69
  Our released weights are distributed training weights that can be directly loaded for training through XTuner. If you need merged weights, they can be merged using XTuner (using the weights from the domain alignment phase as an example):
70
  ```
71
  xtuner convert pth_to_hf path/pallava_domain_alignment.py ./domain_alignment_weight.pth ./domain_alignment_weight_ft
@@ -92,6 +100,19 @@ NPROC_PER_NODE=8 NNODES=2 PORT=12345 ADDR= NODE_RANK=0 xtuner train pallava_inst
92
 
93
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/ng9KbUevJk5HyYONpOg2S.png)
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ## contact
96
 
97
 
4
  - zh
5
  - en
6
  ---
7
+ # Pathology-LLaVA-(PCaption-0.5M dataset) 
8
 
9
  We developed a domain-speciffc large language-vision assistant (PA-LLaVA) for pathology image understanding. Specifically, (1) we first construct a human pathology image-text dataset by cleaning the public medical image-text data for domainspecific alignment; (2) Using the proposed image-text data, we first train a pathology language-image pretraining (PLIP) model as the specialized visual encoder for pathology image, and then we developed scale-invariant connector to avoid the information loss caused by image scaling; (3) We adopt two-stage learning to train PA-LLaVA, first stage for domain alignment, and second stage for end to end visual question & answering (VQA) task.
10
 
 
16
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/SkFB0x3JunWE_Wae808Nq.png)
17
 
18
 
19
+ ## Human Pathology Image-Text data (PCaption-0.5M)
20
+
21
+ ### Introduction
22
+ These public datasets contain substantial amounts of data unrelated to human pathology. To obtain the human pathology image-text data, we performed two cleaning processes on the raw data, as illustrated in the follow figture: (1) Removing nonpathological images. (2) Removing nonhuman pathology data. Additionally, we excluded image-text pairs with textual descriptions of fewer than 20 words. Ultimately, we obtained 518,413 image-text pairs (named "PCaption-0.5M" ) for the aligned training dataset.
23
+
24
+ Instruction fine-tuning phase we only cleaned PMC-VQA in the same way and obtained 15,788 question-answer pairs related to human pathology. Lastly, we combined PathVQA and Human pathology data obtained from PMC-VQA, thereby constructing a dataset of 35543 question-answer pairs.data.
25
+
26
+ #### Data Cleaning Process
27
 
28
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/IAeFWhH8brZYDaTJnew2N.png)
29
 
30
+ ### Get the Dataset
31
 
32
  ### Step 1 Download the public datasets.
33
  Here we only provide the download link for the public dataset and expose the image id index of our cleaned dataset on HuggingFace.
 
73
  python dataformat.py
74
  ```
75
 
76
+ ## Checkpoint
77
  Our released weights are distributed training weights that can be directly loaded for training through XTuner. If you need merged weights, they can be merged using XTuner (using the weights from the domain alignment phase as an example):
78
  ```
79
  xtuner convert pth_to_hf path/pallava_domain_alignment.py ./domain_alignment_weight.pth ./domain_alignment_weight_ft
 
100
 
101
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/ng9KbUevJk5HyYONpOg2S.png)
102
 
103
+ ## Citation
104
+ ```
105
+ @misc{dai2024pallavalargelanguagevisionassistant,
106
+ title={PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding},
107
+ author={Dawei Dai and Yuanhui Zhang and Long Xu and Qianlan Yang and Xiaojing Shen and Shuyin Xia and Guoyin Wang},
108
+ year={2024},
109
+ eprint={2408.09530},
110
+ archivePrefix={arXiv},
111
+ primaryClass={cs.AI},
112
+ url={https://arxiv.org/abs/2408.09530},
113
+ }
114
+ ```
115
+
116
  ## contact
117
 
118