Create README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,156 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<p align="center"><img src="https://github.com/yuanze-lin/Olympus/blob/main/asset/olympus.png" alt="icon" width="150" height="150" style="vertical-align:middle; margin-right:5px;" /></p>
|
2 |
+
|
3 |
+
# Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025) <br />
|
4 |
+
|
5 |
+
[](https://arxiv.org/pdf/2412.09612)
|
6 |
+
[](https://arxiv.org/pdf/2412.09612)
|
7 |
+
[](https://yuanze-lin.me/Olympus_page/)
|
8 |
+
[](https://huggingface.co/Yuanze/Olympus)
|
9 |
+
|
10 |
+
Official implementation of "Olympus: A Universal Task Router for Computer Vision Tasks"
|
11 |
+
|
12 |
+
[![Weights]: If you find our project is helpful for your research, please kindly give us a :star2: and cite our paper :bookmark_tabs: : )**
|
13 |
+
|
14 |
+
## :mega: News
|
15 |
+
- [ ] Release the code for integration with task-specific models.
|
16 |
+
- [x] Release the training & inference code.
|
17 |
+
- [x] Release Olympus datasets.
|
18 |
+
- [x] Release the model of Olympus.
|
19 |
+
|
20 |
+
|
21 |
+
## :low_brightness: Overview
|
22 |
+
|
23 |
+

|
24 |
+
|
25 |
+
|
26 |
+
## Getting Started
|
27 |
+
|
28 |
+
### :hammer_and_wrench: Environment Installation <a href="#install" id="install"/>
|
29 |
+
To establish the environment, just run this code in the shell:
|
30 |
+
```
|
31 |
+
git clone https://github.com/yuanze-lin/Olympus.git
|
32 |
+
cd Olympus
|
33 |
+
conda create -n olympus python==3.10 -y
|
34 |
+
conda activate olympus
|
35 |
+
pip install -r requirements.txt
|
36 |
+
```
|
37 |
+
That will create the environment ```olympus``` we used.
|
38 |
+
|
39 |
+
### Download Models & Data ###
|
40 |
+
We share our collected Olympus dataset as follows:
|
41 |
+
|
42 |
+
| Instruction | Link |
|
43 |
+
|---------|------|
|
44 |
+
| Olympus Task-wise Data | [Olympus_20tasks_all](https://drive.google.com/drive/folders/1m3FYHarVG8eg7X7cMAC5N5NBG-p0ymw8?usp=drive_link) |
|
45 |
+
| Olympus Fine-tuning Data | [Olympus.json](https://drive.google.com/file/d/1CMLZLa6hkVN2K1ebCcJEOaFGc2cLeLQ7/view?usp=sharing) |
|
46 |
+
|
47 |
+
- ```Olympus_20tasks_all```: There are 20 JSON files, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in ```coa.json```. Each of these 21 JSON files includes both training and test data.
|
48 |
+
- ```Olympus.json```: The final fine-tuning data.
|
49 |
+
|
50 |
+
|
51 |
+
(1) Download the Olympus model:
|
52 |
+
```
|
53 |
+
python download_olympus.py
|
54 |
+
```
|
55 |
+
It will save the ```Olympus``` model under the ```ckpts``` folder.
|
56 |
+
|
57 |
+
(2) Download the Olympus data for fine-tuning:
|
58 |
+
```
|
59 |
+
python download_olympus_json.py
|
60 |
+
```
|
61 |
+
The json data will be saved as ```Olympus.json``` in the ```train_data``` folder. Note that ```Olympus.json``` includes ```llava_v1_5_mix665k.json``` combined with our collected data from 20 tasks.
|
62 |
+
|
63 |
+
**If you want to merge the data manually, firstly create ```jsons``` folder by ```mkdir jsons```, download all the JSON files from [Olympus_20tasks_all](https://drive.google.com/drive/folders/1m3FYHarVG8eg7X7cMAC5N5NBG-p0ymw8?usp=drive_link) and [llava_v1_5_mix665k.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) into the ```jsons``` folder, then run the merge script:**
|
64 |
+
|
65 |
+
```
|
66 |
+
python scripts/merge_data.py
|
67 |
+
```
|
68 |
+
|
69 |
+
(3) Download the Mipha-3B model for fine-tuning:
|
70 |
+
```
|
71 |
+
python download_mipha_3b.py
|
72 |
+
```
|
73 |
+
It will save the ```Mipha-3B``` model under the ```ckpts``` folder.
|
74 |
+
|
75 |
+
### Inference
|
76 |
+
|
77 |
+
Run the following code for inference:
|
78 |
+
```
|
79 |
+
model_name=Olympus
|
80 |
+
MODELDIR=ckpts/$model_name
|
81 |
+
|
82 |
+
python predict.py \
|
83 |
+
--prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
|
84 |
+
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
|
85 |
+
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
|
86 |
+
In the following step, produce a high-resolution 3D model based on the modified image. \
|
87 |
+
At the next point, please show a video of a cat and a dog running on a playground." \
|
88 |
+
--model-path $MODELDIR \
|
89 |
+
--temperature 0 \
|
90 |
+
--conv-mode v0
|
91 |
+
```
|
92 |
+
Alternatively, you can run ```bash predict.sh``` as we did.
|
93 |
+
|
94 |
+
The prediction should be like:
|
95 |
+
```
|
96 |
+
Input Prompt: Generate an image of a fluffy orange cat lounging on a windowsill,
|
97 |
+
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
|
98 |
+
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
|
99 |
+
In the following step, produce a high-resolution 3D model based on the modified image.
|
100 |
+
At the next point, please show a video of a cat and a dog running on a playground.
|
101 |
+
|
102 |
+
Output: <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
|
103 |
+
through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
|
104 |
+
<image_edit>change the cat's color to white.</image_edit>
|
105 |
+
<3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
|
106 |
+
<video_gen>a cat and a dog running on a playground.</video_gen>
|
107 |
+
```
|
108 |
+
Change the ```--prompt``` to customize the input prompt as needed.
|
109 |
+
|
110 |
+
### Visual Instruction Tuning
|
111 |
+
Please refer [here](https://github.com/haotian-liu/LLaVA/blob/9a26bd1435b4ac42c282757f2c16d34226575e96/README.md#visual-instruction-tuning) to prepare the instruction tuning data. Especially, store the images from different datasets under ```train_data``` folder.
|
112 |
+
|
113 |
+
Run the following code to fine-tune the model:
|
114 |
+
```
|
115 |
+
bash scripts/mipha/finetune.sh
|
116 |
+
```
|
117 |
+
|
118 |
+
### Evaluation
|
119 |
+
To evaluate the model's performance on different benchmarks:
|
120 |
+
|
121 |
+
See [Evaluation.md](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md).
|
122 |
+
|
123 |
+
Please place the evaluation data under the ```eval``` folder. The evaluation scripts are placed under ```scripts/mipha/eval/```.
|
124 |
+
For example, to test the model's performance on VQAv2 dataset, simply run:
|
125 |
+
|
126 |
+
```
|
127 |
+
bash scripts/mipha/eval/vqav2.sh
|
128 |
+
```
|
129 |
+
|
130 |
+
## :crystal_ball: Suppored Capacities (Covering 20 tasks)
|
131 |
+
|
132 |
+

|
133 |
+
|
134 |
+
|
135 |
+
## :snowboarder: Diverse Applications
|
136 |
+
|
137 |
+

|
138 |
+
|
139 |
+
## Citation
|
140 |
+
|
141 |
+
If you find Olympus useful for your research and applications, please cite using this BibTeX:
|
142 |
+
|
143 |
+
```
|
144 |
+
@article{lin2024olympus,
|
145 |
+
title={Olympus: A Universal Task Router for Computer Vision Tasks},
|
146 |
+
author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
|
147 |
+
journal={arXiv preprint arXiv:2412.09612},
|
148 |
+
year={2024}
|
149 |
+
}
|
150 |
+
```
|
151 |
+
|
152 |
+
## Acknowledgement
|
153 |
+
Our project is built upon the following foundations:
|
154 |
+
|
155 |
+
- [Mipha](https://github.com/xmoanvaf/llava-phi): An impressive open-source project for lightweight vision-language assistants
|
156 |
+
- [LLaVA](https://github.com/haotian-liu/LLaVA): A powerful open-source vision-language assistant project
|