Spaces:
Running
Running
| license: mit | |
| title: PDF Insight PRO | |
| sdk: docker | |
| emoji: 💻 | |
| colorFrom: blue | |
| colorTo: green | |
| short_description: Agentic RAG APP | |
| # PDF Insight Pro | |
| An advanced PDF document analysis tool that combines RAG (Retrieval Augmented Generation) with agentic search capabilities to provide intelligent answers to queries about PDF documents. | |
| ## Table of Contents | |
| - [Overview](#overview) | |
| - [Features](#features) | |
| - [RAG SYSTEM PERFORMANCE](#rag-system-metrics) | |
| - [Architecture](#architecture) | |
| - [Technical Stack](#technical-stack) | |
| - [Installation](#installation) | |
| - [Usage](#usage) | |
| - [API Endpoints](#api-endpoints) | |
| - [Deployment](#deployment) | |
| - [Android App](#android-app) | |
| - [License](#license) | |
| ## Overview | |
| PDF Insight Pro is a sophisticated document analysis tool that allows users to upload PDF documents and ask questions about their content. The system uses state-of-the-art RAG techniques, combining document chunking, embedding generation, similarity search, and LLM processing to provide accurate and contextually relevant answers. | |
| The application employs an agentic approach that can augment the document's information with web search capabilities when needed, ensuring comprehensive and up-to-date responses. | |
| ## Features | |
| - **PDF Document Processing**: Upload and process PDF documents with automated text extraction and chunking | |
| - **Agentic RAG System**: Combines document retrieval with powerful LLM reasoning | |
| - **Web Search Integration**: Verifies document information with Tavily search API integration | |
| - **Session Management**: Persistent session handling for chat history and document context | |
| - **Multiple LLM Support**: Choose from different language models (Llama 4 Scout, Llama 3.1, Llama 3.3) | |
| - **FastAPI Backend**: High-performance API with async support | |
| - **Responsive UI**: User-friendly interface adaptable to different screen sizes | |
| - **Docker Containerization**: Easy deployment with containerized application | |
| - **Hugging Face Integration**: Automatic deployment to Hugging Face Spaces | |
| - **Android Application**: Native mobile client | |
| ## RAG System Metrics | |
| 1. **Key Metrics Overview**: | |
| | Metric | Value | | |
| | ----------------------------------- | ------- | | |
| | **Semantic Similarity (Mean)** | `0.852` | | |
| | **ROUGE-L F1 Score (Mean)** | `0.395` | | |
| | **Semantic Similarity (Max)** | `1.000` | | |
| | **ROUGE-L F1 Score (Max)** | `1.000` | | |
| | **Semantic Similarity (Min)** | `0.592` | | |
| | **ROUGE-L F1 Score (Min)** | `0.099` | | |
| | **Standard Deviation (Similarity)** | `0.089` | | |
| | **Standard Deviation (ROUGE-L F1)** | `0.217` | | |
| 2. **Quantile Distribution**: | |
| | Percentile | Semantic Similarity | ROUGE-L F1 Score | | |
| | ---------- | ------------------- | ---------------- | | |
| | **25%** | `0.7946` | `0.2516` | | |
| | **50%** | `0.8732` | `0.3256` | | |
| | **75%** | `0.9181` | `0.4951` | | |
| 3. **Evaluation Status**: | |
| | Status | Count | Percentage | | |
| | ------ | ----- | ---------- | | |
| | PASS | `64` | `85.3%` | | |
| | FAIL | `11` | `14.7%` | | |
| ## Architecture | |
| The application follows a modular architecture with these main components: | |
| ### System Architecture Diagram | |
| ```mermaid | |
| --- | |
| config: | |
| theme: forest | |
| look: neo | |
| layout: dagre | |
| --- | |
| flowchart TD | |
| subgraph subGraph0["Presentation Layer"] | |
| direction TB | |
| Browser["Web Browser UI"] | |
| Android["Android WebView Client"] | |
| end | |
| subgraph subGraph1["API Layer"] | |
| direction TB | |
| APIGateway["FastAPI Entrypoints"] | |
| ChatRoutes["chat_routes.py"] | |
| SessionRoutes["session_routes.py"] | |
| UploadRoutes["upload_routes.py"] | |
| UtilityRoutes["utility_routes.py"] | |
| AppMain["app.py"] | |
| end | |
| subgraph subGraph2["Config & Models"] | |
| direction TB | |
| ConfigLoader["config.py"] | |
| DataModels["models.py"] | |
| end | |
| subgraph subGraph3["Service Layer"] | |
| direction TB | |
| RAGService["rag_service.py"] | |
| LLMService["llm_service.py"] | |
| SessionService["session_service.py"] | |
| end | |
| subgraph subGraph4["Utility Layer"] | |
| direction TB | |
| TextProc["text_processing.py"] | |
| FaissUtil["faiss_utils.py"] | |
| SessionUtil["session_utils.py"] | |
| end | |
| subgraph Storage["Storage"] | |
| direction TB | |
| UploadStore["/uploads (PDFs & sessions)"] | |
| FAISSIndex["FAISS Index (ephemeral/disk)"] | |
| end | |
| subgraph subGraph6["Docker Container"] | |
| direction TB | |
| subGraph1 | |
| subGraph2 | |
| subGraph3 | |
| subGraph4 | |
| Storage | |
| end | |
| subgraph subGraph7["External & DevOps"] | |
| direction TB | |
| GroqAPI["Groq LLM API"] | |
| TavilyAPI["Tavily Web Search API"] | |
| CI["GitHub Actions CI/CD"] | |
| HFS["HuggingFace Spaces"] | |
| DockerfileNode["Dockerfile"] | |
| end | |
| subgraph subGraph8["Static Assets"] | |
| direction TB | |
| StaticApp["Static Web App"] | |
| end | |
| Browser -- HTTP JSON --> StaticApp | |
| StaticApp -- HTTP JSON --> AppMain | |
| Android -- HTTP JSON --> AppMain | |
| AppMain -- routes --> ChatRoutes & SessionRoutes & UploadRoutes & UtilityRoutes | |
| ChatRoutes -- calls --> RAGService | |
| SessionRoutes -- calls --> SessionService | |
| UploadRoutes -- calls --> TextProc | |
| UtilityRoutes -- calls --> SessionUtil | |
| RAGService -- uses --> LLMService & FaissUtil | |
| RAGService -- calls --> GroqAPI & TavilyAPI | |
| LLMService -- uses --> ConfigLoader | |
| SessionService -- uses --> SessionUtil | |
| TextProc -- writes/reads --> UploadStore | |
| SessionUtil -- writes/reads --> UploadStore | |
| FaissUtil -- reads/writes --> FAISSIndex | |
| CI -- build & deploy --> DockerfileNode | |
| DockerfileNode -- deploy --> HFS | |
| Browser:::frontend | |
| Android:::frontend | |
| APIGateway:::api | |
| ChatRoutes:::api | |
| SessionRoutes:::api | |
| UploadRoutes:::api | |
| UtilityRoutes:::api | |
| AppMain:::api | |
| ConfigLoader:::service | |
| DataModels:::service | |
| RAGService:::service | |
| LLMService:::service | |
| SessionService:::service | |
| TextProc:::util | |
| FaissUtil:::util | |
| SessionUtil:::util | |
| UploadStore:::util | |
| FAISSIndex:::util | |
| GroqAPI:::external | |
| TavilyAPI:::external | |
| CI:::devops | |
| HFS:::devops | |
| DockerfileNode:::devops | |
| StaticApp:::frontend | |
| classDef frontend fill:#CCE5FF,stroke:#333,stroke-width:1px | |
| classDef api fill:#DFFFD6,stroke:#333,stroke-width:1px | |
| classDef service fill:#FFE5B4,stroke:#333,stroke-width:1px | |
| classDef util fill:#E3E4FA,stroke:#333,stroke-width:1px | |
| classDef external fill:#E0E0E0,stroke:#333,stroke-width:1px | |
| classDef devops fill:#CCFFFF,stroke:#333,stroke-width:1px | |
| click Android "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/Android%20App/app/src/main/res/layout/activity_splash.xml" | |
| click ChatRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/chat_routes.py" | |
| click SessionRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/session_routes.py" | |
| click UploadRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/upload_routes.py" | |
| click UtilityRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/utility_routes.py" | |
| click AppMain "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/app.py" | |
| click ConfigLoader "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/configs/config.py" | |
| click DataModels "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/models/models.py" | |
| click RAGService "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/services/rag_service.py" | |
| click LLMService "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/services/llm_service.py" | |
| click SessionService "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/services/session_service.py" | |
| click TextProc "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/utils/text_processing.py" | |
| click FaissUtil "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/utils/faiss_utils.py" | |
| click SessionUtil "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/utils/session_utils.py" | |
| click CI "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/.github/workflows/sync_to_hf.yml" | |
| click DockerfileNode "https://github.com/jatin-mehra119/pdf-insight-beta/tree/main/Dockerfile" | |
| click StaticApp "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/static/js/app.js" | |
| ``` | |
| ### Backend Components | |
| 1. **PDF Processing Module** (`preprocessing.py`): | |
| - Document loading and text extraction using PyMuPDF | |
| - Intelligent chunking with metadata preservation | |
| - Embedding generation with sentence transformers | |
| - FAISS vector index for similarity search | |
| 2. **RAG Engine**: | |
| - Context retrieval based on semantic similarity | |
| - LLM integration using Groq API | |
| - Agentic processing with tool-calling capabilities | |
| - Web search augmentation with Tavily API | |
| 3. **API Layer** (`app.py`): | |
| - FastAPI framework for REST endpoints | |
| - Session management and persistence | |
| - File upload and processing | |
| - Chat interface and history management | |
| ### Workflow | |
| 1. **Document Processing**: | |
| - User uploads a PDF document | |
| - System extracts text using PyMuPDF | |
| - Text is chunked into semantically meaningful segments | |
| - Embeddings are generated for each chunk | |
| - A FAISS index is built for efficient similarity search | |
| 2. **Query Processing**: | |
| - User submits a question about the document | |
| - System retrieves relevant chunks using semantic similarity | |
| - Relevant chunks are combined into a context window | |
| - Context and query are sent to the LLM for processing | |
| - Optional: Web search integration for fact verification | |
| 3. **Response Generation**: | |
| - LLM generates a response based on the provided context | |
| - If web search is enabled, additional information may be incorporated | |
| - Response is returned to the user | |
| - Chat history is updated and persisted | |
| ## Project Structure | |
| The project is organized into a modular architecture with clear separation of concerns: | |
| ``` | |
| PDF-Insight-Beta/ | |
| ├── app.py # Main FastAPI application entry point | |
| ├── gen_dataset.py # Dataset generation and RAG evaluation scripts | |
| ├── test_RAG.ipynb # Jupyter notebook for RAG system testing and metrics | |
| ├── requirements.txt # Python dependencies | |
| ├── Dockerfile # Container configuration for deployment | |
| ├── LICENSE # MIT license file | |
| ├── README.md # Project documentation | |
| ├── README_hf.md # Hugging Face Spaces specific documentation | |
| ├── | |
| ├── api/ # API route handlers (modular FastAPI routes) | |
| │ ├── __init__.py # Exports all route handlers | |
| │ ├── chat_routes.py # Chat and conversation management endpoints | |
| │ ├── session_routes.py # Session lifecycle management | |
| │ ├── upload_routes.py # PDF upload and processing endpoints | |
| │ └── utility_routes.py # Utility endpoints (models, health checks) | |
| ├── | |
| ├── configs/ # Configuration management | |
| │ └── config.py # Centralized configuration and environment variables | |
| ├── | |
| ├── models/ # Pydantic data models | |
| │ └── models.py # Request/response models for API validation | |
| ├── | |
| ├── services/ # Core business logic services | |
| │ ├── __init__.py # Service module initialization | |
| │ ├── llm_service.py # Language model integration and management | |
| │ ├── rag_service.py # RAG implementation with agentic capabilities | |
| │ └── session_service.py # Session persistence and management | |
| ├── | |
| ├── utils/ # Utility functions and helpers | |
| │ ├── __init__.py # Utility module initialization | |
| │ ├── faiss_utils.py # FAISS vector database operations | |
| │ ├── session_utils.py # Session data serialization/deserialization | |
| │ └── text_processing.py # PDF text extraction and chunking utilities | |
| ├── | |
| ├── static/ # Frontend web application | |
| │ ├── index.html # Main web interface | |
| │ ├── css/ | |
| │ │ └── styles.css # Application styling and responsive design | |
| │ └── js/ | |
| │ └── app.js # Frontend JavaScript for user interactions | |
| ├── | |
| ├── development_scripts/ # Legacy and development utilities | |
| │ ├── app.py # Original monolithic application (deprecated) | |
| │ └── preprocessing.py # Original preprocessing functions (deprecated) | |
| ├── | |
| ├── uploads/ # Temporary storage for uploaded files and sessions | |
| │ ├── *.pdf # Uploaded PDF documents | |
| │ └── *_session.pkl # Serialized session data | |
| ├── | |
| └── Android App/ # Native Android application | |
| ├── app/ # Android app source code | |
| │ ├── src/main/java/com/jatinmehra/ # Java source files | |
| │ ├── src/main/res/ # Android resources (layouts, drawables, etc.) | |
| │ └── AndroidManifest.xml # Android app configuration | |
| ├── gradle/ # Gradle build system files | |
| └── build.gradle # Project build configuration | |
| ``` | |
| ### Key Components Description | |
| #### Core Application Files | |
| - **`app.py`**: Main FastAPI application that orchestrates all components and sets up the web server | |
| - **`gen_dataset.py`**: Comprehensive evaluation script for RAG system performance using the neural-bridge dataset | |
| - **`test_RAG.ipynb`**: Interactive Jupyter notebook for testing RAG capabilities and analyzing metrics | |
| #### API Layer (`api/`) | |
| - **`chat_routes.py`**: Handles chat interactions, query processing, and conversation flow | |
| - **`session_routes.py`**: Manages session lifecycle, history retrieval, and cleanup operations | |
| - **`upload_routes.py`**: Processes PDF uploads, text extraction, and document indexing | |
| - **`utility_routes.py`**: Provides system utilities like model listing and health checks | |
| #### Configuration (`configs/`) | |
| - **`config.py`**: Centralizes all application settings, API keys, model configurations, and environment variables | |
| #### Data Models (`models/`) | |
| - **`models.py`**: Defines Pydantic models for request/response validation and API documentation | |
| #### Business Logic (`services/`) | |
| - **`llm_service.py`**: Manages language model interactions, prompt engineering, and response generation | |
| - **`rag_service.py`**: Implements the core RAG pipeline with agentic search capabilities and tool integration | |
| - **`session_service.py`**: Handles session persistence, chat history, and user context management | |
| #### Utilities (`utils/`) | |
| - **`faiss_utils.py`**: Provides FAISS vector database operations for similarity search and indexing | |
| - **`session_utils.py`**: Handles session serialization, deserialization, and data persistence | |
| - **`text_processing.py`**: PDF text extraction, intelligent chunking, and preprocessing utilities | |
| #### Frontend (`static/`) | |
| - **`index.html`**: Responsive web interface with modern UI design | |
| - **`styles.css`**: CSS styling with mobile-first responsive design principles | |
| - **`app.js`**: JavaScript for dynamic interactions, file uploads, and chat functionality | |
| #### Mobile Application (`Android App/`) | |
| - **Native Android client**: WebView-based mobile application that interfaces with the web app | |
| - **Java source code**: Activity management, splash screen, and WebView configuration | |
| - **Android resources**: UI layouts, icons, and mobile-specific configurations | |
| ## Technical Stack | |
| ### Backend | |
| - **Python 3.12**: Core programming language | |
| - **FastAPI**: API framework with async support | |
| - **PyMuPDF**: PDF processing library | |
| - **LangChain**: Framework for LLM application development | |
| - **FAISS**: Vector similarity search library from Facebook AI | |
| - **Sentence Transformers**: Text embedding generation | |
| - **Groq API**: LLM inference service | |
| - **Tavily API**: Web search integration | |
| - **Uvicorn**: ASGI server | |
| ### Frontend | |
| - **HTML/CSS/JavaScript**: Core web technologies | |
| - **Font Awesome**: Icon library | |
| - **Highlight.js**: Code syntax highlighting | |
| - **Marked.js**: Markdown rendering | |
| - **Responsive Design**: Mobile-friendly interface | |
| *Note: The frontend was developed with assistance from Claude 3.7 AI.* | |
| ### DevOps | |
| - **Docker**: Containerization | |
| - **GitHub Actions**: CI/CD pipeline | |
| - **Hugging Face Spaces**: Deployment platform | |
| ## Installation | |
| ### Prerequisites | |
| - Python 3.12+ | |
| - API keys for Groq and Tavily | |
| ### Local Setup | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/Jatin-Mehra119/PDF-Insight-Beta.git | |
| cd PDF-Insight-Beta | |
| ``` | |
| 2. Create and activate a virtual environment: | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| ``` | |
| 3. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 4. Create a `.env` file with your API keys: | |
| ``` | |
| GROQ_API_KEY=your_groq_api_key | |
| TAVILY_API_KEY=your_tavily_api_key | |
| ``` | |
| 5. Run the application: | |
| ```bash | |
| uvicorn app:app --host 0.0.0.0 --port 8000 --reload | |
| ``` | |
| ### Docker Deployment | |
| 1. Build the Docker image: | |
| ```bash | |
| docker build -t pdf-insight-pro . | |
| ``` | |
| 2. Run the container: | |
| ```bash | |
| docker run -p 7860:7860 \ | |
| --mount type=secret,id=GROQ_API_KEY,dst=/run/secrets/GROQ_API_KEY \ | |
| --mount type=secret,id=TAVILY_API_KEY,dst=/run/secrets/TAVILY_API_KEY \ | |
| pdf-insight-pro | |
| ``` | |
| ## Usage | |
| 1. Open the application in your browser at `http://localhost:8000` | |
| 2. Upload a PDF document using the interface | |
| 3. Wait for processing to complete | |
| 4. Ask questions about the document in the chat interface | |
| 5. Toggle the "Use web search" option for enhanced responses | |
| ## API Endpoints | |
| - **GET `/`**: Redirect to static HTML interface | |
| - **POST `/upload-pdf`**: Upload and process a PDF document | |
| - Returns a session ID for subsequent queries | |
| - **POST `/chat`**: Send a query about the uploaded document | |
| - Requires session ID from previous upload | |
| - Optional parameter to enable web search | |
| - **POST `/chat-history`**: Retrieve chat history for a session | |
| - **POST `/clear-history`**: Clear chat history for a session | |
| - **POST `/remove-pdf`**: Remove PDF and session data | |
| - **GET `/models`**: List available language models | |
| ## Deployment | |
| ### Hugging Face Spaces | |
| This project is configured for automatic deployment to Hugging Face Spaces using GitHub Actions. The workflow in `.github/workflows/sync_to_hf.yml` handles the deployment process. | |
| To deploy to your own space: | |
| 1. Fork this repository | |
| 2. Create a Hugging Face Space | |
| 3. Add your Hugging Face token as a GitHub secret named `HF_TOKEN` | |
| 4. Update the username and space name in the workflow file | |
| 5. Push to the main branch to trigger deployment | |
| ## Android App | |
| The repository includes an Android application that serves as a mobile interface to the web application. Rather than implementing a native client with direct API integration, the Android app utilizes a WebView component to load the deployed web interface from Hugging Face Spaces. This approach ensures consistency between the web and mobile experiences while reducing maintenance overhead. | |
| ### Android App Features | |
| - WebView integration to the deployed web application | |
| - Splash screen with app branding | |
| - Responsive design that adapts to the mobile interface | |
| - Native Android navigation and user experience | |
| - Direct access to the full functionality of the web application | |
| ### Implementation Details | |
| The Android app is implemented using Java and consists of: | |
| - SplashActivity: Displays the app logo and transitions to the main activity | |
| - MainActivity: Contains a WebView component that loads the deployed web application | |
| - WebView configuration: Enables JavaScript, DOM storage, and handles file uploads | |
| ## License | |
| MIT |