Update README.md
Browse files
README.md
CHANGED
@@ -78,7 +78,83 @@ You just need
|
|
78 |
pip install protobuf
|
79 |
```
|
80 |
|
81 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
1. Alignment module pretraining
|
83 |
```
|
84 |
NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
|
|
|
78 |
pip install protobuf
|
79 |
```
|
80 |
|
81 |
+
## Data prepration
|
82 |
+
|
83 |
+
#### File structure
|
84 |
+
|
85 |
+
```
|
86 |
+
./data/llava_data
|
87 |
+
βββ LLaVA-Pretrain
|
88 |
+
βΒ Β βββ blip_laion_cc_sbu_558k.json
|
89 |
+
βΒ Β βββ blip_laion_cc_sbu_558k_meta.json
|
90 |
+
βΒ Β βββ images
|
91 |
+
βββ LLaVA-Instruct-150K
|
92 |
+
βΒ Β βββ llava_v1_5_mix665k.json
|
93 |
+
βββ llava_images
|
94 |
+
Β Β βββ coco
|
95 |
+
Β Β β βββ train2017
|
96 |
+
Β Β βββ gqa
|
97 |
+
Β Β β βββ images
|
98 |
+
Β Β βββ ocr_vqa
|
99 |
+
Β Β β βββ images
|
100 |
+
Β Β βββ textvqa
|
101 |
+
Β Β β βββ train_images
|
102 |
+
Β Β βββ vg
|
103 |
+
Β Β Β Β βββ VG_100K
|
104 |
+
Β Β βββ VG_100K_2
|
105 |
+
```
|
106 |
+
|
107 |
+
#### Pretrain Data
|
108 |
+
|
109 |
+
LLaVA-Pretrain
|
110 |
+
|
111 |
+
```shell
|
112 |
+
# Make sure you have git-lfs installed (https://git-lfs.com)
|
113 |
+
git lfs install
|
114 |
+
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
|
115 |
+
```
|
116 |
+
|
117 |
+
#### Finetune Data
|
118 |
+
|
119 |
+
1. Text data
|
120 |
+
|
121 |
+
1. LLaVA-Instruct-150K
|
122 |
+
|
123 |
+
```shell
|
124 |
+
# Make sure you have git-lfs installed (https://git-lfs.com)
|
125 |
+
git lfs install
|
126 |
+
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
|
127 |
+
```
|
128 |
+
|
129 |
+
2. Image data
|
130 |
+
|
131 |
+
1. COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)
|
132 |
+
|
133 |
+
2. GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
|
134 |
+
|
135 |
+
3. OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)
|
136 |
+
|
137 |
+
1. β οΈ Modify the name of OCR-VQA's images to keep the extension as `.jpg`!
|
138 |
+
|
139 |
+
```shell
|
140 |
+
#!/bin/bash
|
141 |
+
ocr_vqa_path="<your-directory-path>"
|
142 |
+
|
143 |
+
find "$target_dir" -type f | while read file; do
|
144 |
+
extension="${file##*.}"
|
145 |
+
if [ "$extension" != "jpg" ]
|
146 |
+
then
|
147 |
+
cp -- "$file" "${file%.*}.jpg"
|
148 |
+
fi
|
149 |
+
done
|
150 |
+
```
|
151 |
+
|
152 |
+
4. TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
|
153 |
+
|
154 |
+
5. VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
|
155 |
+
|
156 |
+
## Cheers! Train your model
|
157 |
+
|
158 |
1. Alignment module pretraining
|
159 |
```
|
160 |
NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
|