Feature Extraction
Safetensors
English
minicpmv
VisRAG
custom_code
tcy6 commited on
Commit
485cb4f
Β·
verified Β·
1 Parent(s): a932f2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -12
README.md CHANGED
@@ -27,27 +27,47 @@ pipeline_tag: feature-extraction
27
  </a>
28
  </div>
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  **VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
31
  <p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
32
 
33
- ## VisRAG Pipeline
34
 
35
- ### VisRAG-Ret
 
 
 
 
 
36
  **VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
37
 
38
- ### VisRAG-Gen
39
  In the paper, We use MiniCPM-V 2.0, MiniCPM-V 2.6 and GPT-4o as the generators. Actually you can use any VLMs you like!
40
 
41
- ## Training
42
 
43
- ### VisRAG-Ret
44
  Our training dataset of 362,110 Query-Document (Q-D) Pairs for **VisRAG-Ret** is comprised of train sets of openly available academic datasets (34%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (GPT-4o) pseudo-queries (66%). It can be found in the `VisRAG` Collection on Hugging Face, which is referenced at the beginning of this page.
45
 
46
 
47
- ### VisRAG-Gen
48
  The generation part does not use any fine-tuning; we directly use off-the-shelf LLMs/VLMs for generation.
49
 
50
- ## Requirements
51
  ```
52
  torch==2.1.2
53
  torchvision==0.16.2
@@ -57,9 +77,9 @@ decord==0.6.0
57
  Pillow==10.1.0
58
  ```
59
 
60
- ## Usage
61
 
62
- ### VisRAG-Ret
63
  ```python
64
  from transformers import AutoModel, AutoTokenizer
65
  import torch
@@ -125,13 +145,13 @@ scores = (embeddings_query @ embeddings_doc.T)
125
  print(scores.tolist())
126
  ```
127
 
128
- ## License
129
 
130
  * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
131
  * The usage of **VisRAG-Ret** model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
132
  * The models and weights of **VisRAG-Ret** are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, **VisRAG-Ret** weights are also available for free commercial use.
133
 
134
- ## Citation
135
 
136
  ```
137
  @misc{yu2024visragvisionbasedretrievalaugmentedgeneration,
@@ -145,7 +165,7 @@ print(scores.tolist())
145
  }
146
  ```
147
 
148
- ## Contact
149
 
150
  - Shi Yu: [email protected]
151
  - Chaoyue Tang: [email protected]
 
27
  </a>
28
  </div>
29
 
30
+ <p align="center">β€’
31
+ <a href="#-introduction"> πŸ“– Introduction </a> β€’
32
+ <a href="#-news">πŸŽ‰ News</a> β€’
33
+ <a href="#-visrag-pipeline">✨ VisRAG Pipeline</a> β€’
34
+ <a href="-training">⚑️ Training</a>
35
+ </p>
36
+ <p align="center">β€’
37
+ <a href="#-requirements">πŸ“¦ Requirements</a> β€’
38
+ <a href="#-usage">πŸ”§ Usage</a> β€’
39
+ <a href="#-license">πŸ“„ Lisense</a>β€’
40
+ <a href="-citation">πŸ“‘ Citation</a>
41
+ <a href="-contact">πŸ“§ Contact</a>
42
+ </p>
43
+
44
+ # πŸ“– Introduction
45
  **VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
46
  <p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
47
 
48
+ # πŸŽ‰ News
49
 
50
+ * 20241015: Released our train data and test data on Hugging Face which can be found in the [VisRAG](https://huggingface.co/collections/openbmb/visrag-6717bbfb471bb018a49f1c69) Collection on Hugging Face. It is referenced at the beginning of this page.
51
+ * 20241014: Released our [Paper](https://arxiv.org/abs/2410.10594) on arXiv. Released our [Model](https://huggingface.co/openbmb/VisRAG-Ret) on Hugging Face.
52
+
53
+ # ✨ VisRAG Pipeline
54
+
55
+ ## VisRAG-Ret
56
  **VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
57
 
58
+ ## VisRAG-Gen
59
  In the paper, We use MiniCPM-V 2.0, MiniCPM-V 2.6 and GPT-4o as the generators. Actually you can use any VLMs you like!
60
 
61
+ # ⚑️ Training
62
 
63
+ ## VisRAG-Ret
64
  Our training dataset of 362,110 Query-Document (Q-D) Pairs for **VisRAG-Ret** is comprised of train sets of openly available academic datasets (34%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (GPT-4o) pseudo-queries (66%). It can be found in the `VisRAG` Collection on Hugging Face, which is referenced at the beginning of this page.
65
 
66
 
67
+ ## VisRAG-Gen
68
  The generation part does not use any fine-tuning; we directly use off-the-shelf LLMs/VLMs for generation.
69
 
70
+ # πŸ“¦ Requirements
71
  ```
72
  torch==2.1.2
73
  torchvision==0.16.2
 
77
  Pillow==10.1.0
78
  ```
79
 
80
+ # πŸ”§ Usage
81
 
82
+ ## VisRAG-Ret
83
  ```python
84
  from transformers import AutoModel, AutoTokenizer
85
  import torch
 
145
  print(scores.tolist())
146
  ```
147
 
148
+ # πŸ“„ License
149
 
150
  * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
151
  * The usage of **VisRAG-Ret** model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
152
  * The models and weights of **VisRAG-Ret** are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, **VisRAG-Ret** weights are also available for free commercial use.
153
 
154
+ # πŸ“‘ Citation
155
 
156
  ```
157
  @misc{yu2024visragvisionbasedretrievalaugmentedgeneration,
 
165
  }
166
  ```
167
 
168
+ # πŸ“§ Contact
169
 
170
  - Shi Yu: [email protected]
171
  - Chaoyue Tang: [email protected]