Spaces:
Running
Running
Jatin Mehra
commited on
Commit
Β·
0fe16fd
1
Parent(s):
9dbace0
Add project structure and key components description to README files for improved documentation
Browse files- README.md +104 -0
- README_hf.md +104 -0
README.md
CHANGED
|
@@ -114,6 +114,109 @@ The application follows a modular architecture with these main components:
|
|
| 114 |
- Response is returned to the user
|
| 115 |
- Chat history is updated and persisted
|
| 116 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
## Technical Stack
|
| 118 |
|
| 119 |
### Backend
|
|
@@ -249,3 +352,4 @@ The Android app is implemented using Java and consists of:
|
|
| 249 |
## License
|
| 250 |
|
| 251 |
MIT
|
|
|
|
|
|
| 114 |
- Response is returned to the user
|
| 115 |
- Chat history is updated and persisted
|
| 116 |
|
| 117 |
+
## Project Structure
|
| 118 |
+
|
| 119 |
+
The project is organized into a modular architecture with clear separation of concerns:
|
| 120 |
+
|
| 121 |
+
```
|
| 122 |
+
PDF-Insight-Beta/
|
| 123 |
+
βββ app.py # Main FastAPI application entry point
|
| 124 |
+
βββ gen_dataset.py # Dataset generation and RAG evaluation scripts
|
| 125 |
+
βββ test_RAG.ipynb # Jupyter notebook for RAG system testing and metrics
|
| 126 |
+
βββ requirements.txt # Python dependencies
|
| 127 |
+
βββ Dockerfile # Container configuration for deployment
|
| 128 |
+
βββ LICENSE # MIT license file
|
| 129 |
+
βββ README.md # Project documentation
|
| 130 |
+
βββ README_hf.md # Hugging Face Spaces specific documentation
|
| 131 |
+
βββ
|
| 132 |
+
βββ api/ # API route handlers (modular FastAPI routes)
|
| 133 |
+
β βββ __init__.py # Exports all route handlers
|
| 134 |
+
β βββ chat_routes.py # Chat and conversation management endpoints
|
| 135 |
+
β βββ session_routes.py # Session lifecycle management
|
| 136 |
+
β βββ upload_routes.py # PDF upload and processing endpoints
|
| 137 |
+
β βββ utility_routes.py # Utility endpoints (models, health checks)
|
| 138 |
+
βββ
|
| 139 |
+
βββ configs/ # Configuration management
|
| 140 |
+
β βββ config.py # Centralized configuration and environment variables
|
| 141 |
+
βββ
|
| 142 |
+
βββ models/ # Pydantic data models
|
| 143 |
+
β βββ models.py # Request/response models for API validation
|
| 144 |
+
βββ
|
| 145 |
+
βββ services/ # Core business logic services
|
| 146 |
+
β βββ __init__.py # Service module initialization
|
| 147 |
+
β βββ llm_service.py # Language model integration and management
|
| 148 |
+
β βββ rag_service.py # RAG implementation with agentic capabilities
|
| 149 |
+
β βββ session_service.py # Session persistence and management
|
| 150 |
+
βββ
|
| 151 |
+
βββ utils/ # Utility functions and helpers
|
| 152 |
+
β βββ __init__.py # Utility module initialization
|
| 153 |
+
β βββ faiss_utils.py # FAISS vector database operations
|
| 154 |
+
β βββ session_utils.py # Session data serialization/deserialization
|
| 155 |
+
β βββ text_processing.py # PDF text extraction and chunking utilities
|
| 156 |
+
βββ
|
| 157 |
+
βββ static/ # Frontend web application
|
| 158 |
+
β βββ index.html # Main web interface
|
| 159 |
+
β βββ css/
|
| 160 |
+
β β βββ styles.css # Application styling and responsive design
|
| 161 |
+
β βββ js/
|
| 162 |
+
β βββ app.js # Frontend JavaScript for user interactions
|
| 163 |
+
βββ
|
| 164 |
+
βββ development_scripts/ # Legacy and development utilities
|
| 165 |
+
β βββ app.py # Original monolithic application (deprecated)
|
| 166 |
+
β βββ preprocessing.py # Original preprocessing functions (deprecated)
|
| 167 |
+
βββ
|
| 168 |
+
βββ uploads/ # Temporary storage for uploaded files and sessions
|
| 169 |
+
β βββ *.pdf # Uploaded PDF documents
|
| 170 |
+
β βββ *_session.pkl # Serialized session data
|
| 171 |
+
βββ
|
| 172 |
+
βββ Android App/ # Native Android application
|
| 173 |
+
βββ app/ # Android app source code
|
| 174 |
+
β βββ src/main/java/com/jatinmehra/ # Java source files
|
| 175 |
+
β βββ src/main/res/ # Android resources (layouts, drawables, etc.)
|
| 176 |
+
β βββ AndroidManifest.xml # Android app configuration
|
| 177 |
+
βββ gradle/ # Gradle build system files
|
| 178 |
+
βββ build.gradle # Project build configuration
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
### Key Components Description
|
| 182 |
+
|
| 183 |
+
#### Core Application Files
|
| 184 |
+
- **`app.py`**: Main FastAPI application that orchestrates all components and sets up the web server
|
| 185 |
+
- **`gen_dataset.py`**: Comprehensive evaluation script for RAG system performance using the neural-bridge dataset
|
| 186 |
+
- **`test_RAG.ipynb`**: Interactive Jupyter notebook for testing RAG capabilities and analyzing metrics
|
| 187 |
+
|
| 188 |
+
#### API Layer (`api/`)
|
| 189 |
+
- **`chat_routes.py`**: Handles chat interactions, query processing, and conversation flow
|
| 190 |
+
- **`session_routes.py`**: Manages session lifecycle, history retrieval, and cleanup operations
|
| 191 |
+
- **`upload_routes.py`**: Processes PDF uploads, text extraction, and document indexing
|
| 192 |
+
- **`utility_routes.py`**: Provides system utilities like model listing and health checks
|
| 193 |
+
|
| 194 |
+
#### Configuration (`configs/`)
|
| 195 |
+
- **`config.py`**: Centralizes all application settings, API keys, model configurations, and environment variables
|
| 196 |
+
|
| 197 |
+
#### Data Models (`models/`)
|
| 198 |
+
- **`models.py`**: Defines Pydantic models for request/response validation and API documentation
|
| 199 |
+
|
| 200 |
+
#### Business Logic (`services/`)
|
| 201 |
+
- **`llm_service.py`**: Manages language model interactions, prompt engineering, and response generation
|
| 202 |
+
- **`rag_service.py`**: Implements the core RAG pipeline with agentic search capabilities and tool integration
|
| 203 |
+
- **`session_service.py`**: Handles session persistence, chat history, and user context management
|
| 204 |
+
|
| 205 |
+
#### Utilities (`utils/`)
|
| 206 |
+
- **`faiss_utils.py`**: Provides FAISS vector database operations for similarity search and indexing
|
| 207 |
+
- **`session_utils.py`**: Handles session serialization, deserialization, and data persistence
|
| 208 |
+
- **`text_processing.py`**: PDF text extraction, intelligent chunking, and preprocessing utilities
|
| 209 |
+
|
| 210 |
+
#### Frontend (`static/`)
|
| 211 |
+
- **`index.html`**: Responsive web interface with modern UI design
|
| 212 |
+
- **`styles.css`**: CSS styling with mobile-first responsive design principles
|
| 213 |
+
- **`app.js`**: JavaScript for dynamic interactions, file uploads, and chat functionality
|
| 214 |
+
|
| 215 |
+
#### Mobile Application (`Android App/`)
|
| 216 |
+
- **Native Android client**: WebView-based mobile application that interfaces with the web app
|
| 217 |
+
- **Java source code**: Activity management, splash screen, and WebView configuration
|
| 218 |
+
- **Android resources**: UI layouts, icons, and mobile-specific configurations
|
| 219 |
+
|
| 220 |
## Technical Stack
|
| 221 |
|
| 222 |
### Backend
|
|
|
|
| 352 |
## License
|
| 353 |
|
| 354 |
MIT
|
| 355 |
+
|
README_hf.md
CHANGED
|
@@ -123,6 +123,110 @@ The application follows a modular architecture with these main components:
|
|
| 123 |
- Response is returned to the user
|
| 124 |
- Chat history is updated and persisted
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
## Technical Stack
|
| 127 |
|
| 128 |
### Backend
|
|
|
|
| 123 |
- Response is returned to the user
|
| 124 |
- Chat history is updated and persisted
|
| 125 |
|
| 126 |
+
## Project Structure
|
| 127 |
+
|
| 128 |
+
The project is organized into a modular architecture with clear separation of concerns:
|
| 129 |
+
|
| 130 |
+
```
|
| 131 |
+
PDF-Insight-Beta/
|
| 132 |
+
βββ app.py # Main FastAPI application entry point
|
| 133 |
+
βββ gen_dataset.py # Dataset generation and RAG evaluation scripts
|
| 134 |
+
βββ test_RAG.ipynb # Jupyter notebook for RAG system testing and metrics
|
| 135 |
+
βββ requirements.txt # Python dependencies
|
| 136 |
+
βββ Dockerfile # Container configuration for deployment
|
| 137 |
+
βββ LICENSE # MIT license file
|
| 138 |
+
βββ README.md # Project documentation
|
| 139 |
+
βββ README_hf.md # Hugging Face Spaces specific documentation
|
| 140 |
+
βββ
|
| 141 |
+
βββ api/ # API route handlers (modular FastAPI routes)
|
| 142 |
+
β βββ __init__.py # Exports all route handlers
|
| 143 |
+
β βββ chat_routes.py # Chat and conversation management endpoints
|
| 144 |
+
β βββ session_routes.py # Session lifecycle management
|
| 145 |
+
β βββ upload_routes.py # PDF upload and processing endpoints
|
| 146 |
+
β βββ utility_routes.py # Utility endpoints (models, health checks)
|
| 147 |
+
βββ
|
| 148 |
+
βββ configs/ # Configuration management
|
| 149 |
+
β βββ config.py # Centralized configuration and environment variables
|
| 150 |
+
βββ
|
| 151 |
+
βββ models/ # Pydantic data models
|
| 152 |
+
β βββ models.py # Request/response models for API validation
|
| 153 |
+
βββ
|
| 154 |
+
βββ services/ # Core business logic services
|
| 155 |
+
β βββ __init__.py # Service module initialization
|
| 156 |
+
β βββ llm_service.py # Language model integration and management
|
| 157 |
+
β βββ rag_service.py # RAG implementation with agentic capabilities
|
| 158 |
+
β βββ session_service.py # Session persistence and management
|
| 159 |
+
βββ
|
| 160 |
+
βββ utils/ # Utility functions and helpers
|
| 161 |
+
β βββ __init__.py # Utility module initialization
|
| 162 |
+
β βββ faiss_utils.py # FAISS vector database operations
|
| 163 |
+
β βββ session_utils.py # Session data serialization/deserialization
|
| 164 |
+
β βββ text_processing.py # PDF text extraction and chunking utilities
|
| 165 |
+
βββ
|
| 166 |
+
βββ static/ # Frontend web application
|
| 167 |
+
β βββ index.html # Main web interface
|
| 168 |
+
β βββ css/
|
| 169 |
+
β β βββ styles.css # Application styling and responsive design
|
| 170 |
+
β βββ js/
|
| 171 |
+
β βββ app.js # Frontend JavaScript for user interactions
|
| 172 |
+
βββ
|
| 173 |
+
βββ development_scripts/ # Legacy and development utilities
|
| 174 |
+
β βββ app.py # Original monolithic application (deprecated)
|
| 175 |
+
β βββ preprocessing.py # Original preprocessing functions (deprecated)
|
| 176 |
+
βββ
|
| 177 |
+
βββ uploads/ # Temporary storage for uploaded files and sessions
|
| 178 |
+
β βββ *.pdf # Uploaded PDF documents
|
| 179 |
+
β βββ *_session.pkl # Serialized session data
|
| 180 |
+
βββ
|
| 181 |
+
βββ Android App/ # Native Android application
|
| 182 |
+
βββ app/ # Android app source code
|
| 183 |
+
β βββ src/main/java/com/jatinmehra/ # Java source files
|
| 184 |
+
β βββ src/main/res/ # Android resources (layouts, drawables, etc.)
|
| 185 |
+
β βββ AndroidManifest.xml # Android app configuration
|
| 186 |
+
βββ gradle/ # Gradle build system files
|
| 187 |
+
βββ build.gradle # Project build configuration
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
### Key Components Description
|
| 191 |
+
|
| 192 |
+
#### Core Application Files
|
| 193 |
+
- **`app.py`**: Main FastAPI application that orchestrates all components and sets up the web server
|
| 194 |
+
- **`gen_dataset.py`**: Comprehensive evaluation script for RAG system performance using the neural-bridge dataset
|
| 195 |
+
- **`test_RAG.ipynb`**: Interactive Jupyter notebook for testing RAG capabilities and analyzing metrics
|
| 196 |
+
|
| 197 |
+
#### API Layer (`api/`)
|
| 198 |
+
- **`chat_routes.py`**: Handles chat interactions, query processing, and conversation flow
|
| 199 |
+
- **`session_routes.py`**: Manages session lifecycle, history retrieval, and cleanup operations
|
| 200 |
+
- **`upload_routes.py`**: Processes PDF uploads, text extraction, and document indexing
|
| 201 |
+
- **`utility_routes.py`**: Provides system utilities like model listing and health checks
|
| 202 |
+
|
| 203 |
+
#### Configuration (`configs/`)
|
| 204 |
+
- **`config.py`**: Centralizes all application settings, API keys, model configurations, and environment variables
|
| 205 |
+
|
| 206 |
+
#### Data Models (`models/`)
|
| 207 |
+
- **`models.py`**: Defines Pydantic models for request/response validation and API documentation
|
| 208 |
+
|
| 209 |
+
#### Business Logic (`services/`)
|
| 210 |
+
- **`llm_service.py`**: Manages language model interactions, prompt engineering, and response generation
|
| 211 |
+
- **`rag_service.py`**: Implements the core RAG pipeline with agentic search capabilities and tool integration
|
| 212 |
+
- **`session_service.py`**: Handles session persistence, chat history, and user context management
|
| 213 |
+
|
| 214 |
+
#### Utilities (`utils/`)
|
| 215 |
+
- **`faiss_utils.py`**: Provides FAISS vector database operations for similarity search and indexing
|
| 216 |
+
- **`session_utils.py`**: Handles session serialization, deserialization, and data persistence
|
| 217 |
+
- **`text_processing.py`**: PDF text extraction, intelligent chunking, and preprocessing utilities
|
| 218 |
+
|
| 219 |
+
#### Frontend (`static/`)
|
| 220 |
+
- **`index.html`**: Responsive web interface with modern UI design
|
| 221 |
+
- **`styles.css`**: CSS styling with mobile-first responsive design principles
|
| 222 |
+
- **`app.js`**: JavaScript for dynamic interactions, file uploads, and chat functionality
|
| 223 |
+
|
| 224 |
+
#### Mobile Application (`Android App/`)
|
| 225 |
+
- **Native Android client**: WebView-based mobile application that interfaces with the web app
|
| 226 |
+
- **Java source code**: Activity management, splash screen, and WebView configuration
|
| 227 |
+
- **Android resources**: UI layouts, icons, and mobile-specific configurations
|
| 228 |
+
|
| 229 |
+
|
| 230 |
## Technical Stack
|
| 231 |
|
| 232 |
### Backend
|