navvew commited on
Commit
d0adbeb
·
verified ·
1 Parent(s): fd301c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language: en
4
+ tags:
5
+ - reranker
6
+ - RAG
7
+ - multimodal
8
+ - vision-language
9
+ - Qwen
10
+ license: cc-by-4.0
11
+ pipeline_tag: visual-document-retrieval
12
+ ---
13
+
14
+ # DocReRank: Multi-Modal Reranker
15
+
16
+ This is the official model from the paper:
17
+
18
+ 📄 **[DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers](https://arxiv.org/abs/2505.22584)**
19
+
20
+ ---
21
+
22
+ ## ✅ Model Overview
23
+ - **Base model:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
24
+ - **Architecture:** Vision-Language reranker
25
+ - **Fine-tuning method:** PEFT (LoRA)
26
+ - **Training data:** Generated by **Single-Page Hard Negative Query Generation** Pipeline.
27
+ - **Purpose:** Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios.
28
+
29
+ ---
30
+
31
+ ## ✅ How to Use
32
+
33
+ This adapter requires the base Qwen2-VL model.
34
+
35
+ ```python
36
+ from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
37
+ from peft import PeftModel
38
+ import torch
39
+ from PIL import Image
40
+
41
+ # Load base model
42
+ base_model = Qwen2VLForConditionalGeneration.from_pretrained(
43
+ "Qwen/Qwen2-VL-2B-Instruct",
44
+ torch_dtype=torch.bfloat16,
45
+ device_map="cuda"
46
+ )
47
+
48
+ # Load DocReRank adapter
49
+ model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval()
50
+
51
+ # Load processor
52
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
53
+
54
+ # Example query and image
55
+ query = "What is the total revenue in the table?"
56
+ image = Image.open("sample_page.png")
57
+
58
+ inputs = processor(text=query, images=image, return_tensors="pt").to("cuda", torch.bfloat16)
59
+
60
+ with torch.no_grad():
61
+ outputs = model.generate(**inputs, max_new_tokens=16)
62
+
63
+ print(processor.tokenizer.decode(outputs[0], skip_special_tokens=True))