--- language: en tags: - reranker - RAG - multimodal - vision-language - Qwen license: cc-by-4.0 pipeline_tag: visual-document-retrieval --- # DocReRank: Multi-Modal Reranker This is the official model from the paper: 📄 **[DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers](https://arxiv.org/abs/2505.22584)** See [Project Page](https://navvewas.github.io/DocReRank/) for more information. --- ## ✅ Model Overview - **Base model:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) - **Architecture:** Vision-Language reranker - **Fine-tuning method:** PEFT (LoRA) - **Training data:** Generated by **Single-Page Hard Negative Query Generation** Pipeline. - **Purpose:** Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios. --- ## ✅ How to Use This adapter requires the base Qwen2-VL model. ```python from transformers import AutoProcessor, Qwen2VLForConditionalGeneration from peft import PeftModel import torch from PIL import Image from huggingface_hub import hf_hub_download import os # ✅ Load base model base_model = Qwen2VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2-VL-2B-Instruct", torch_dtype=torch.bfloat16, device_map="cuda" ) # ✅ Load DocReRank adapter model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval() # ✅ Load processor processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct") processor.image_processor.min_pixels = 200704 processor.image_processor.max_pixels = 589824 # ✅ Define query and images query_text = "What are the performances of the DocReRank model on restaurant and biomedical benchmarks?" # query_text = "Are there ablation results for the DocReRank model?" # Downloading Pages for Demo save_dir = os.path.join(os.getcwd(), "paper_pages") os.makedirs(save_dir, exist_ok=True) image_files = ["DocReRank_paper_page_2.png","DocReRank_paper_page_4.png","DocReRank_paper_page_6.png","DocReRank_paper_page_8.png"] local_paths = [] for f in image_files: local_path = hf_hub_download(repo_id="DocReRank/DocReRank-Reranker",filename=f,local_dir=save_dir) local_paths.append(local_path) print("✅ Files downloaded to:", local_paths) image_paths = [ "paper_pages/DocReRank_paper_page_2.png", "paper_pages/DocReRank_paper_page_4.png", "paper_pages/DocReRank_paper_page_6.png", "paper_pages/DocReRank_paper_page_8.png"] # ✅ Reranking prompt template def compute_score(image_path, query_text): image = Image.open(image_path) prompt = f"Assert the relevance of the previous image document to the following query, answer True or False. The query is: {query_text}" messages = [{"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": prompt}]}] # Tokenize text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=text, images=image, return_tensors="pt").to(model.device, torch.bfloat16) # Compute logits with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits[:, -1, :] true_id = processor.tokenizer.convert_tokens_to_ids("True") false_id = processor.tokenizer.convert_tokens_to_ids("False") probs = torch.softmax(logits[:, [true_id, false_id]], dim=-1) relevance_score = probs[0, 0].item() # Probability of "True" return relevance_score # ✅ Compute scores for both images scores = [(img, compute_score(img, query_text)) for img in image_paths] # ✅ Print results for img, score in scores: print(f"Image: {img} | Relevance Score: {score:.4f}") ``` ## Citation If you use this dataset, please cite: ```bibtex @article{wasserman2025docrerank, title={DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers}, author={Wasserman, Navve and Heinimann, Oliver and Golbari, Yuval and Zimbalist, Tal and Schwartz, Eli and Irani, Michal}, journal={arXiv preprint arXiv:2505.22584}, year={2025} } ```