{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "QgXrYJdQ5ci_" }, "source": [ "# Multimodal RAG using ColPali (with Byaldi) and Qwen2-VL\n", "\n", "[ColPali](https://huggingface.co/blog/manu/colpali) is a multimodal retriever that removes the need for hefty and brittle document processors. It natively handles images and processes and encodes image patches to be compatible with text, thus removing need to do OCR, or image captioning.\n", "\n", "\n", "\n", "After indexing data, we will use [Qwen2-VL-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) to do generation part in RAG.\n", "\n", "[Byaldi](https://github.com/AnswerDotAI/byaldi) is a new library by answer.ai to easily use ColPali. This library is in a very early stage, so this notebook will likely be updated soon with API changes." ] }, { "cell_type": "markdown", "metadata": { "id": "WnRb4SDvwbbs" }, "source": [ "## Install Byaldi" ] }, { "cell_type": "markdown", "metadata": { "id": "TL3-FujJ6fgW" }, "source": [ "We will install byaldi using pip to get started." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "RlDOWAfa2O7m" }, "outputs": [], "source": [ "!pip install --upgrade byaldi" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "po2vHWznzhu5", "outputId": "8d4f4db9-c089-4719-b894-ba67e285d388" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reading package lists... Done\n", "Building dependency tree... Done\n", "Reading state information... Done\n", "The following NEW packages will be installed:\n", " poppler-utils\n", "0 upgraded, 1 newly installed, 0 to remove and 49 not upgraded.\n", "Need to get 186 kB of archives.\n", "After this operation, 696 kB of additional disk space will be used.\n", "Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 poppler-utils amd64 22.02.0-2ubuntu0.5 [186 kB]\n", "Fetched 186 kB in 1s (150 kB/s)\n", "debconf: unable to initialize frontend: Dialog\n", "debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 1.)\n", "debconf: falling back to frontend: Readline\n", "debconf: unable to initialize frontend: Readline\n", "debconf: (This frontend requires a controlling tty.)\n", "debconf: falling back to frontend: Teletype\n", "dpkg-preconfigure: unable to re-open stdin: \n", "Selecting previously unselected package poppler-utils.\n", "(Reading database ... 123597 files and directories currently installed.)\n", "Preparing to unpack .../poppler-utils_22.02.0-2ubuntu0.5_amd64.deb ...\n", "Unpacking poppler-utils (22.02.0-2ubuntu0.5) ...\n", "Setting up poppler-utils (22.02.0-2ubuntu0.5) ...\n", "Processing triggers for man-db (2.10.2-1) ...\n" ] } ], "source": [ "!sudo apt-get install -y poppler-utils" ] }, { "cell_type": "markdown", "metadata": { "id": "flc7Eyta-F8H" }, "source": [ "We need to install transformers from main to get Qwen." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Zy-wWGGxzKsk", "outputId": "aa3b4257-81be-44f3-87f6-3836060e7872" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", " Building wheel for transformers (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n" ] } ], "source": [ "!pip install -q pdf2image git+https://github.com/huggingface/transformers.git qwen-vl-utils flash-attn" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 145, "referenced_widgets": [ "6a78e5b76f454e338a04f9c0a565beb5", "8a4ca23ed0834be283a0b1779ad74da3", "097f4a791f804afdaa46e731eee61154", "5d02a6515606463b891be72e3c272fb7", "d709841c144d445cbc167bcd5eeb8f91", "c73f8283d22a4a60b1128b07ad7c8971", "8826974dff9b4ecebd2b41e646a02f67", "6498f8600e2d4fdb9fbb116e7d494c5f", "d10da93ba62f43b0a5547aa9bf012952", "c10cba7574064d809cc23a43415d7a5e", "17b2e03542404c4aa835cca0b6c15dcb", "677618efd43f4b54a51dd301c2707737", "0b79ab15db744998ad46595e67b63455", "c57e085dbce044d1bc5d6bd8cc61843c", "bd7539d339ae4a33a62247e68fbb38b9", "b2e5f1076192480789ba1dbec7674701", "13e265ea022f49fdb5184f916dcde7ba", "d9a89d52f7934c0c8151afcbd18b1967", "2413eb56a55a4e8b8eb7deec6faf5a98", "f1f7763b484140248060d9e715246567", "16b2ba24201e4ab7ae092e5b5ea59672", "e4f6ebd2a7bd4064b16d68da5fa732f9", "9122ea328ace4d15818d8fb0cc4da236", "c2dd9b45807e4a369e0296bf51d720c3", "626c149c810943bfacd49b74e8ca19cc", "d8efe718157a4bc3ad946caf3c0c89c2", "1de3960878aa4fd382c11172a73f1ef2", "bc54f16ffe7b49b0ac19f65c34ee43e5", "65a2ef3660a44c5292a46f8ebe5d26b8", "ea2d71450873423d911415d7561cd5b5", "05ed3c50cf0241eca9a65fbe9f03618d", "567aa4039c9f4c0890a6f48bcf0ecf01" ] }, "id": "2EbYqBHc2DdS", "outputId": "6fb628fd-c8a3-4bcd-9599-8f15c6f0cbf3" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6a78e5b76f454e338a04f9c0a565beb5", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='