RangeKing commited on
Commit
2f7772b
·
unverified ·
1 Parent(s): 7eaf80d

[Docs] Add `README_zh-CN` and correct `README` format (#6)

Browse files

* Create README_zh-CN.md

* Update README.md

* fix typos

Files changed (2) hide show
  1. README.md +31 -28
  2. README_zh-CN.md +163 -0
README.md CHANGED
@@ -6,7 +6,7 @@ Based on the open-source multi-modal model [OpenFlamingo](https://github.com/mlf
6
 
7
  The **joint training** of visual and language instructions effectively improves the performance of the model!
8
 
9
- Welcome to join us
10
 
11
  <div align="center">
12
  <a href="https://openmmlab.medium.com/" style="text-decoration:none;">
@@ -28,13 +28,14 @@ Welcome to join us!
28
  <img src="https://user-images.githubusercontent.com/25839884/219026120-ba71e48b-6e94-4bd4-b4e9-b7d175b5e362.png" width="3%" alt="" /></a>
29
  </div>
30
 
31
- # Features
32
 
33
  - Support various vision and language instruction data
34
  - Parameter efficient fine-tuning with LoRA
35
  - Tuning vision and language at the same time, complement each other
36
 
37
- # Installation
 
38
 
39
  To install the package in an existing environment, run
40
 
@@ -52,17 +53,17 @@ conda env create -f environment.yml
52
  ```
53
 
54
 
55
- # Demo
56
 
57
  1. Download the pre-trained weights.
58
 
59
  Use [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) for converting LLaMA weights to HuggingFace format.
60
 
61
- Download the OpenFlamingo pre-trained model from [openflamingo/OpenFlamingo-9B](https://huggingface.co/openflamingo/OpenFlamingo-9B)
62
 
63
- Download our LoRA Weight from [here](https://download.openmmlab.com/mmgpt/v0/mmgpt-lora-v0-release.pt)
64
 
65
- Then place these models in checkpoints folders like this:
66
 
67
  ```
68
  checkpoints
@@ -81,56 +82,58 @@ conda env create -f environment.yml
81
  python app.py
82
  ```
83
 
84
- # Examples
85
 
86
  ### Recipe:
87
  ![image4](https://user-images.githubusercontent.com/12907710/234554562-8f3be88f-d563-47ba-97d9-ade8d47c46b0.png)
88
 
89
  ### Travel plan:
90
  ![image3](https://user-images.githubusercontent.com/12907710/234523464-80c4e3f0-f99f-4498-96ef-dc43ef89c64b.png)
 
91
  ### Movie:
92
  ![image2](https://user-images.githubusercontent.com/12907710/234523468-e11905a6-491f-4b87-934f-90da7d14d1c3.png)
 
93
  ### Famous person:
94
  ![image](https://user-images.githubusercontent.com/12907710/234523475-fd91f979-a344-4228-813f-6b55a1bc250f.png)
95
 
96
 
97
- # Fine-tuning
98
 
99
- ## Prepare datasets
100
 
101
  1. [A-OKVQA](https://allenai.org/project/a-okvqa/home)
102
 
103
- Download annotation from [this link](https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz) and unzip to `data/aokvqa/annotations`
104
 
105
  It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).
106
 
107
  2. [COCO Caption](https://cs.stanford.edu/people/karpathy/deepimagesent/)
108
 
109
- Download from [this link](https://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip) and unzip to `data/coco`
110
 
111
  It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).
112
 
113
  3. [OCR VQA](https://ocr-vqa.github.io/)
114
 
115
- Download from [this link](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) and place in `data/OCR_VQA/`
116
 
117
  4. [LlaVA](https://llava-vl.github.io/)
118
 
119
- Download from [liuhaotian/LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) and place in `data/llava/`
120
 
121
  It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).
122
 
123
  5. [Mini-GPT4](https://minigpt-4.github.io/)
124
 
125
- Download from [Vision-CAIR/cc_sbu_align](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align) and place in `data/cc_sbu_align/`
126
 
127
  6. [Dolly 15k](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html)
128
 
129
- Download from [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and place it in `data/dolly/databricks-dolly-15k.jsonl`
130
 
131
  7. [Alpaca GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
132
 
133
- Download it from [this link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/raw/main/data/alpaca_gpt4_data.json) and place it in `data/alpaca_gpt4/alpaca_gpt4_data.json`
134
 
135
  You can also customize the data path in the [configs/dataset_config.py](configs/dataset_config.py).
136
 
@@ -139,20 +142,20 @@ You can also customize the data path in the [configs/dataset_config.py](configs/
139
 
140
  ```bash
141
  torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
142
- --lm_path checkpoints/llama-7b_hf \
143
- --tokenizer_path checkpoints/llama-7b_hf \
144
- --pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
145
- --run_name train-my-gpt4 \
146
- --learning_rate 1e-5 \
147
- --lr_scheduler cosine \
148
- --batch_size 1 \
149
- --tuning_config configs/lora_config.py \
150
- --dataset_config configs/dataset_config.py \
151
- --report_to_wandb \
152
  ```
153
 
154
 
155
- # Acknowledgements
156
 
157
  - [OpenFlamingo](https://github.com/mlfoundations/open_flamingo)
158
  - [LAVIS](https://github.com/salesforce/LAVIS)
 
6
 
7
  The **joint training** of visual and language instructions effectively improves the performance of the model!
8
 
9
+ Welcome to join us!
10
 
11
  <div align="center">
12
  <a href="https://openmmlab.medium.com/" style="text-decoration:none;">
 
28
  <img src="https://user-images.githubusercontent.com/25839884/219026120-ba71e48b-6e94-4bd4-b4e9-b7d175b5e362.png" width="3%" alt="" /></a>
29
  </div>
30
 
31
+ ## Features
32
 
33
  - Support various vision and language instruction data
34
  - Parameter efficient fine-tuning with LoRA
35
  - Tuning vision and language at the same time, complement each other
36
 
37
+
38
+ ## Installation
39
 
40
  To install the package in an existing environment, run
41
 
 
53
  ```
54
 
55
 
56
+ ## Demo
57
 
58
  1. Download the pre-trained weights.
59
 
60
  Use [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) for converting LLaMA weights to HuggingFace format.
61
 
62
+ Download the OpenFlamingo pre-trained model from [openflamingo/OpenFlamingo-9B](https://huggingface.co/openflamingo/OpenFlamingo-9B).
63
 
64
+ Download our LoRA Weight from [here](https://download.openmmlab.com/mmgpt/v0/mmgpt-lora-v0-release.pt).
65
 
66
+ Then place these models in `checkpoints` folders like this:
67
 
68
  ```
69
  checkpoints
 
82
  python app.py
83
  ```
84
 
85
+ ## Examples
86
 
87
  ### Recipe:
88
  ![image4](https://user-images.githubusercontent.com/12907710/234554562-8f3be88f-d563-47ba-97d9-ade8d47c46b0.png)
89
 
90
  ### Travel plan:
91
  ![image3](https://user-images.githubusercontent.com/12907710/234523464-80c4e3f0-f99f-4498-96ef-dc43ef89c64b.png)
92
+
93
  ### Movie:
94
  ![image2](https://user-images.githubusercontent.com/12907710/234523468-e11905a6-491f-4b87-934f-90da7d14d1c3.png)
95
+
96
  ### Famous person:
97
  ![image](https://user-images.githubusercontent.com/12907710/234523475-fd91f979-a344-4228-813f-6b55a1bc250f.png)
98
 
99
 
100
+ ## Fine-tuning
101
 
102
+ ### Prepare datasets
103
 
104
  1. [A-OKVQA](https://allenai.org/project/a-okvqa/home)
105
 
106
+ Download annotation from [this link](https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz) and unzip to `data/aokvqa/annotations`.
107
 
108
  It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).
109
 
110
  2. [COCO Caption](https://cs.stanford.edu/people/karpathy/deepimagesent/)
111
 
112
+ Download from [this link](https://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip) and unzip to `data/coco`.
113
 
114
  It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).
115
 
116
  3. [OCR VQA](https://ocr-vqa.github.io/)
117
 
118
+ Download from [this link](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) and place in `data/OCR_VQA/`.
119
 
120
  4. [LlaVA](https://llava-vl.github.io/)
121
 
122
+ Download from [liuhaotian/LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) and place in `data/llava/`.
123
 
124
  It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).
125
 
126
  5. [Mini-GPT4](https://minigpt-4.github.io/)
127
 
128
+ Download from [Vision-CAIR/cc_sbu_align](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align) and place in `data/cc_sbu_align/`.
129
 
130
  6. [Dolly 15k](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html)
131
 
132
+ Download from [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and place it in `data/dolly/databricks-dolly-15k.jsonl`.
133
 
134
  7. [Alpaca GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
135
 
136
+ Download it from [this link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/raw/main/data/alpaca_gpt4_data.json) and place it in `data/alpaca_gpt4/alpaca_gpt4_data.json`.
137
 
138
  You can also customize the data path in the [configs/dataset_config.py](configs/dataset_config.py).
139
 
 
142
 
143
  ```bash
144
  torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
145
+ --lm_path checkpoints/llama-7b_hf \
146
+ --tokenizer_path checkpoints/llama-7b_hf \
147
+ --pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
148
+ --run_name train-my-gpt4 \
149
+ --learning_rate 1e-5 \
150
+ --lr_scheduler cosine \
151
+ --batch_size 1 \
152
+ --tuning_config configs/lora_config.py \
153
+ --dataset_config configs/dataset_config.py \
154
+ --report_to_wandb
155
  ```
156
 
157
 
158
+ ## Acknowledgements
159
 
160
  - [OpenFlamingo](https://github.com/mlfoundations/open_flamingo)
161
  - [LAVIS](https://github.com/salesforce/LAVIS)
README_zh-CN.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🤖 Multi-modal GPT
2
+
3
+ 使用视觉和语言指令训练一个多模态聊天机器人!
4
+
5
+ 基于开源多模态模型 [OpenFlamingo](https://github.com/mlfoundations/open_flamingo),我们使用公开数据集创建了各种**视觉指令**数据,包括视觉问答、图像字幕、视觉推理、文本 OCR 和视觉对话。此外,我们还使用仅包含**语言指令**数据的语言模型组件进行了训练。
6
+
7
+ 视觉和语言指令的**联合训练**有效提高了模型的性能!
8
+
9
+ 欢迎加入我们!
10
+
11
+ <div align="center">
12
+ <a href="https://openmmlab.medium.com/" style="text-decoration:none;">
13
+ <img src="https://user-images.githubusercontent.com/25839884/219255827-67c1a27f-f8c5-46a9-811d-5e57448c61d1.png" width="3%" alt="" /></a>
14
+ <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
15
+ <a href="https://discord.com/channels/1037617289144569886/1046608014234370059" style="text-decoration:none;">
16
+ <img src="https://user-images.githubusercontent.com/25839884/218347213-c080267f-cbb6-443e-8532-8e1ed9a58ea9.png" width="3%" alt="" /></a>
17
+ <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
18
+ <a href="https://twitter.com/OpenMMLab" style="text-decoration:none;">
19
+ <img src="https://user-images.githubusercontent.com/25839884/218346637-d30c8a0f-3eba-4699-8131-512fb06d46db.png" width="3%" alt="" /></a>
20
+ <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
21
+ <a href="https://www.youtube.com/openmmlab" style="text-decoration:none;">
22
+ <img src="https://user-images.githubusercontent.com/25839884/218346691-ceb2116a-465a-40af-8424-9f30d2348ca9.png" width="3%" alt="" /></a>
23
+ <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
24
+ <a href="https://space.bilibili.com/1293512903" style="text-decoration:none;">
25
+ <img src="https://user-images.githubusercontent.com/25839884/219026751-d7d14cce-a7c9-4e82-9942-8375fca65b99.png" width="3%" alt="" /></a>
26
+ <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
27
+ <a href="https://www.zhihu.com/people/openmmlab" style="text-decoration:none;">
28
+ <img src="https://user-images.githubusercontent.com/25839884/219026120-ba71e48b-6e94-4bd4-b4e9-b7d175b5e362.png" width="3%" alt="" /></a>
29
+ </div>
30
+
31
+ ## 特性
32
+
33
+ - 支持各种视觉和语言指令数据
34
+ - 使用 LoRA 进行参数高效微调
35
+ - 同时调整视觉和语言,相互补充
36
+
37
+ ## 安装
38
+
39
+ 在一个已有环境中安装依赖包,运行以下指令
40
+
41
+ ```bash
42
+ git clone https://github.com/open-mmlab/Multimodal-GPT.git
43
+ cd Multimodal-GPT
44
+ pip install -r requirements.txt
45
+ pip install -v -e .
46
+ ```
47
+
48
+ 或者创建一个新的 conda 环境
49
+
50
+ ```bash
51
+ conda env create -f environment.yml
52
+ ```
53
+
54
+ ## Demo
55
+
56
+ 1. 下载预训练权重
57
+
58
+ 使用[这个脚本](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py)把 LLaMA 权重转换成 HuggingFace 格式。
59
+
60
+ 从 [openflamingo/OpenFlamingo-9B](https://huggingface.co/openflamingo/OpenFlamingo-9B) 下载 OpenFlamingo 预训练模型。
61
+
62
+ 从[这个链接](https://download.openmmlab.com/mmgpt/v0/mmgpt-lora-v0-release.pt) 下载我们的 LoRA 权重。
63
+
64
+ 然后把所有模型权重放到 `checkpoints` 文件夹下,目录结构如下:
65
+
66
+ ```
67
+ checkpoints
68
+ ├── llama-7b_hf
69
+ │ ├── config.json
70
+ │ ├── pytorch_model-00001-of-00002.bin
71
+ │ ├── ......
72
+ │ └── tokenizer.model
73
+ ├── OpenFlamingo-9B
74
+ │ └──checkpoint.pt
75
+ ├──mmgpt-lora-v0-release.pt
76
+
77
+ 2. 启动 gradio demo
78
+
79
+ ```bash
80
+ python app.py
81
+ ```
82
+
83
+ ## 示例
84
+
85
+ ### 菜单:
86
+ ![image4](https://user-images.githubusercontent.com/12907710/234554562-8f3be88f-d563-47ba-97d9-ade8d47c46b0.png)
87
+
88
+ ### 旅行计划:
89
+ ![image3](https://user-images.githubusercontent.com/12907710/234523464-80c4e3f0-f99f-4498-96ef-dc43ef89c64b.png)
90
+
91
+ ### 电影:
92
+ ![image2](https://user-images.githubusercontent.com/12907710/234523468-e11905a6-491f-4b87-934f-90da7d14d1c3.png)
93
+
94
+ ### 名人:
95
+ ![image](https://user-images.githubusercontent.com/12907710/234523475-fd91f979-a344-4228-813f-6b55a1bc250f.png)
96
+
97
+
98
+ ## 微调 Fine-tuning
99
+
100
+ ### 准备数据集
101
+
102
+ 1. [A-OKVQA](https://allenai.org/project/a-okvqa/home)
103
+
104
+ 从[这个链接](https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz)下载标注,解压到 `data/aokvqa/annotations` 路径下。
105
+
106
+ 同时还需要 coco 数据集的图像,可以从[这里](https://cocodataset.org/#home)下载。
107
+
108
+ 2. [COCO Caption](https://cs.stanford.edu/people/karpathy/deepimagesent/)
109
+
110
+ 从[这个链接](https://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip),解压到 `data/coco` 路径下。
111
+
112
+ 同时还需要 coco 数据集的图像,可以从[这里](https://cocodataset.org/#home)下载。
113
+
114
+ 3. [OCR VQA](https://ocr-vqa.github.io/)
115
+
116
+ 从 [这个链接](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) 下载数据集,放到 `data/OCR_VQA/` 路径下。
117
+
118
+ 4. [LlaVA](https://llava-vl.github.io/)
119
+
120
+ 从 [liuhaotian/LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) 下载数据集,放到 `data/llava/` 路径下。
121
+
122
+ 同时还需要 coco 数据集的图像,可以从[这里](https://cocodataset.org/#home)下载。
123
+
124
+ 5. [Mini-GPT4](https://minigpt-4.github.io/)
125
+
126
+ 从 [Vision-CAIR/cc_sbu_align](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align) 下载数据集,放到 `data/cc_sbu_align/` 路径下。
127
+
128
+ 6. [Dolly 15k](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html)
129
+
130
+ 从 [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) 下载数据集,放到 `data/dolly/databricks-dolly-15k.jsonl` 路径下。
131
+
132
+ 7. [Alpaca GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
133
+
134
+ 从[这个链接](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/raw/main/data/alpaca_gpt4_data.json) 下载数据集,放到 `data/alpaca_gpt4/alpaca_gpt4_data.json` 路径下。
135
+
136
+ 你也可以在 [configs/dataset_config.py](configs/dataset_config.py) 文件中自定义数据集路径。
137
+
138
+
139
+ ## 开启训练
140
+
141
+ ```bash
142
+ torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
143
+ --lm_path checkpoints/llama-7b_hf \
144
+ --tokenizer_path checkpoints/llama-7b_hf \
145
+ --pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
146
+ --run_name train-my-gpt4 \
147
+ --learning_rate 1e-5 \
148
+ --lr_scheduler cosine \
149
+ --batch_size 1 \
150
+ --tuning_config configs/lora_config.py \
151
+ --dataset_config configs/dataset_config.py \
152
+ --report_to_wandb
153
+ ```
154
+
155
+
156
+ ## 致谢
157
+
158
+ - [OpenFlamingo](https://github.com/mlfoundations/open_flamingo)
159
+ - [LAVIS](https://github.com/salesforce/LAVIS)
160
+ - [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
161
+ - [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
162
+ - [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main)
163
+ - [Instruction Tuning with GPT-4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)