Yuanze commited on
Commit
821ae22
·
verified ·
1 Parent(s): 85d4763

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -7
README.md CHANGED
@@ -1,7 +1,156 @@
1
- ---
2
- license: apache-2.0
3
- metrics:
4
- - accuracy
5
- pipeline_tag: image-text-to-text
6
- library_name: transformers
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center"><img src="https://github.com/yuanze-lin/Olympus/blob/main/asset/olympus.png" alt="icon" width="150" height="150" style="vertical-align:middle; margin-right:5px;" /></p>
2
+
3
+ # Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025) <br />
4
+
5
+ [![PDF](https://img.shields.io/badge/PDF-Download-orange?style=flat-square&logo=adobeacrobatreader&logoColor=white)](https://arxiv.org/pdf/2412.09612)
6
+ [![arXiv](https://img.shields.io/badge/arXiv-2412.09612-b31b1b.svg)](https://arxiv.org/pdf/2412.09612)
7
+ [![Project Page](https://img.shields.io/badge/Project%20Page-Visit%20Now-0078D4?style=flat-square&logo=googlechrome&logoColor=white)](https://yuanze-lin.me/Olympus_page/)
8
+ [![Weights](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-FFD21E)](https://huggingface.co/Yuanze/Olympus)
9
+
10
+ Official implementation of "Olympus: A Universal Task Router for Computer Vision Tasks"
11
+
12
+ [![Weights]: If you find our project is helpful for your research, please kindly give us a :star2: and cite our paper :bookmark_tabs: : )**
13
+
14
+ ## :mega: News
15
+ - [ ] Release the code for integration with task-specific models.
16
+ - [x] Release the training & inference code.
17
+ - [x] Release Olympus datasets.
18
+ - [x] Release the model of Olympus.
19
+
20
+
21
+ ## :low_brightness: Overview
22
+
23
+ ![image](https://github.com/yuanze-lin/Olympus/blob/main/asset/overview.png)
24
+
25
+
26
+ ## Getting Started
27
+
28
+ ### :hammer_and_wrench: Environment Installation <a href="#install" id="install"/>
29
+ To establish the environment, just run this code in the shell:
30
+ ```
31
+ git clone https://github.com/yuanze-lin/Olympus.git
32
+ cd Olympus
33
+ conda create -n olympus python==3.10 -y
34
+ conda activate olympus
35
+ pip install -r requirements.txt
36
+ ```
37
+ That will create the environment ```olympus``` we used.
38
+
39
+ ### Download Models & Data ###
40
+ We share our collected Olympus dataset as follows:
41
+
42
+ | Instruction | Link |
43
+ |---------|------|
44
+ | Olympus Task-wise Data | [Olympus_20tasks_all](https://drive.google.com/drive/folders/1m3FYHarVG8eg7X7cMAC5N5NBG-p0ymw8?usp=drive_link) |
45
+ | Olympus Fine-tuning Data | [Olympus.json](https://drive.google.com/file/d/1CMLZLa6hkVN2K1ebCcJEOaFGc2cLeLQ7/view?usp=sharing) |
46
+
47
+ - ```Olympus_20tasks_all```: There are 20 JSON files, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in ```coa.json```. Each of these 21 JSON files includes both training and test data.
48
+ - ```Olympus.json```: The final fine-tuning data.
49
+
50
+
51
+ (1) Download the Olympus model:
52
+ ```
53
+ python download_olympus.py
54
+ ```
55
+ It will save the ```Olympus``` model under the ```ckpts``` folder.
56
+
57
+ (2) Download the Olympus data for fine-tuning:
58
+ ```
59
+ python download_olympus_json.py
60
+ ```
61
+ The json data will be saved as ```Olympus.json``` in the ```train_data``` folder. Note that ```Olympus.json``` includes ```llava_v1_5_mix665k.json``` combined with our collected data from 20 tasks.
62
+
63
+ **If you want to merge the data manually, firstly create ```jsons``` folder by ```mkdir jsons```, download all the JSON files from [Olympus_20tasks_all](https://drive.google.com/drive/folders/1m3FYHarVG8eg7X7cMAC5N5NBG-p0ymw8?usp=drive_link) and [llava_v1_5_mix665k.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) into the ```jsons``` folder, then run the merge script:**
64
+
65
+ ```
66
+ python scripts/merge_data.py
67
+ ```
68
+
69
+ (3) Download the Mipha-3B model for fine-tuning:
70
+ ```
71
+ python download_mipha_3b.py
72
+ ```
73
+ It will save the ```Mipha-3B``` model under the ```ckpts``` folder.
74
+
75
+ ### Inference
76
+
77
+ Run the following code for inference:
78
+ ```
79
+ model_name=Olympus
80
+ MODELDIR=ckpts/$model_name
81
+
82
+ python predict.py \
83
+ --prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
84
+ with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
85
+ Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
86
+ In the following step, produce a high-resolution 3D model based on the modified image. \
87
+ At the next point, please show a video of a cat and a dog running on a playground." \
88
+ --model-path $MODELDIR \
89
+ --temperature 0 \
90
+ --conv-mode v0
91
+ ```
92
+ Alternatively, you can run ```bash predict.sh``` as we did.
93
+
94
+ The prediction should be like:
95
+ ```
96
+ Input Prompt: Generate an image of a fluffy orange cat lounging on a windowsill,
97
+ with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
98
+ Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
99
+ In the following step, produce a high-resolution 3D model based on the modified image.
100
+ At the next point, please show a video of a cat and a dog running on a playground.
101
+
102
+ Output: <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
103
+ through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
104
+ <image_edit>change the cat's color to white.</image_edit>
105
+ <3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
106
+ <video_gen>a cat and a dog running on a playground.</video_gen>
107
+ ```
108
+ Change the ```--prompt``` to customize the input prompt as needed.
109
+
110
+ ### Visual Instruction Tuning
111
+ Please refer [here](https://github.com/haotian-liu/LLaVA/blob/9a26bd1435b4ac42c282757f2c16d34226575e96/README.md#visual-instruction-tuning) to prepare the instruction tuning data. Especially, store the images from different datasets under ```train_data``` folder.
112
+
113
+ Run the following code to fine-tune the model:
114
+ ```
115
+ bash scripts/mipha/finetune.sh
116
+ ```
117
+
118
+ ### Evaluation
119
+ To evaluate the model's performance on different benchmarks:
120
+
121
+ See [Evaluation.md](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md).
122
+
123
+ Please place the evaluation data under the ```eval``` folder. The evaluation scripts are placed under ```scripts/mipha/eval/```.
124
+ For example, to test the model's performance on VQAv2 dataset, simply run:
125
+
126
+ ```
127
+ bash scripts/mipha/eval/vqav2.sh
128
+ ```
129
+
130
+ ## :crystal_ball: Suppored Capacities (Covering 20 tasks)
131
+
132
+ ![image](https://github.com/yuanze-lin/Olympus/blob/main/asset/capacities.png)
133
+
134
+
135
+ ## :snowboarder: Diverse Applications
136
+
137
+ ![image](https://github.com/yuanze-lin/Olympus/blob/main/asset/application.png)
138
+
139
+ ## Citation
140
+
141
+ If you find Olympus useful for your research and applications, please cite using this BibTeX:
142
+
143
+ ```
144
+ @article{lin2024olympus,
145
+ title={Olympus: A Universal Task Router for Computer Vision Tasks},
146
+ author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
147
+ journal={arXiv preprint arXiv:2412.09612},
148
+ year={2024}
149
+ }
150
+ ```
151
+
152
+ ## Acknowledgement
153
+ Our project is built upon the following foundations:
154
+
155
+ - [Mipha](https://github.com/xmoanvaf/llava-phi): An impressive open-source project for lightweight vision-language assistants
156
+ - [LLaVA](https://github.com/haotian-liu/LLaVA): A powerful open-source vision-language assistant project