Jatin Mehra commited on
Commit
0fe16fd
Β·
1 Parent(s): 9dbace0

Add project structure and key components description to README files for improved documentation

Browse files
Files changed (2) hide show
  1. README.md +104 -0
  2. README_hf.md +104 -0
README.md CHANGED
@@ -114,6 +114,109 @@ The application follows a modular architecture with these main components:
114
  - Response is returned to the user
115
  - Chat history is updated and persisted
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  ## Technical Stack
118
 
119
  ### Backend
@@ -249,3 +352,4 @@ The Android app is implemented using Java and consists of:
249
  ## License
250
 
251
  MIT
 
 
114
  - Response is returned to the user
115
  - Chat history is updated and persisted
116
 
117
+ ## Project Structure
118
+
119
+ The project is organized into a modular architecture with clear separation of concerns:
120
+
121
+ ```
122
+ PDF-Insight-Beta/
123
+ β”œβ”€β”€ app.py # Main FastAPI application entry point
124
+ β”œβ”€β”€ gen_dataset.py # Dataset generation and RAG evaluation scripts
125
+ β”œβ”€β”€ test_RAG.ipynb # Jupyter notebook for RAG system testing and metrics
126
+ β”œβ”€β”€ requirements.txt # Python dependencies
127
+ β”œβ”€β”€ Dockerfile # Container configuration for deployment
128
+ β”œβ”€β”€ LICENSE # MIT license file
129
+ β”œβ”€β”€ README.md # Project documentation
130
+ β”œβ”€β”€ README_hf.md # Hugging Face Spaces specific documentation
131
+ β”œβ”€β”€
132
+ β”œβ”€β”€ api/ # API route handlers (modular FastAPI routes)
133
+ β”‚ β”œβ”€β”€ __init__.py # Exports all route handlers
134
+ β”‚ β”œβ”€β”€ chat_routes.py # Chat and conversation management endpoints
135
+ β”‚ β”œβ”€β”€ session_routes.py # Session lifecycle management
136
+ β”‚ β”œβ”€β”€ upload_routes.py # PDF upload and processing endpoints
137
+ β”‚ └── utility_routes.py # Utility endpoints (models, health checks)
138
+ β”œβ”€β”€
139
+ β”œβ”€β”€ configs/ # Configuration management
140
+ β”‚ └── config.py # Centralized configuration and environment variables
141
+ β”œβ”€β”€
142
+ β”œβ”€β”€ models/ # Pydantic data models
143
+ β”‚ └── models.py # Request/response models for API validation
144
+ β”œβ”€β”€
145
+ β”œβ”€β”€ services/ # Core business logic services
146
+ β”‚ β”œβ”€β”€ __init__.py # Service module initialization
147
+ β”‚ β”œβ”€β”€ llm_service.py # Language model integration and management
148
+ β”‚ β”œβ”€β”€ rag_service.py # RAG implementation with agentic capabilities
149
+ β”‚ └── session_service.py # Session persistence and management
150
+ β”œβ”€β”€
151
+ β”œβ”€β”€ utils/ # Utility functions and helpers
152
+ β”‚ β”œβ”€β”€ __init__.py # Utility module initialization
153
+ β”‚ β”œβ”€β”€ faiss_utils.py # FAISS vector database operations
154
+ β”‚ β”œβ”€β”€ session_utils.py # Session data serialization/deserialization
155
+ β”‚ └── text_processing.py # PDF text extraction and chunking utilities
156
+ β”œβ”€β”€
157
+ β”œβ”€β”€ static/ # Frontend web application
158
+ β”‚ β”œβ”€β”€ index.html # Main web interface
159
+ β”‚ β”œβ”€β”€ css/
160
+ β”‚ β”‚ └── styles.css # Application styling and responsive design
161
+ β”‚ └── js/
162
+ β”‚ └── app.js # Frontend JavaScript for user interactions
163
+ β”œβ”€β”€
164
+ β”œβ”€β”€ development_scripts/ # Legacy and development utilities
165
+ β”‚ β”œβ”€β”€ app.py # Original monolithic application (deprecated)
166
+ β”‚ └── preprocessing.py # Original preprocessing functions (deprecated)
167
+ β”œβ”€β”€
168
+ β”œβ”€β”€ uploads/ # Temporary storage for uploaded files and sessions
169
+ β”‚ β”œβ”€β”€ *.pdf # Uploaded PDF documents
170
+ β”‚ └── *_session.pkl # Serialized session data
171
+ β”œβ”€β”€
172
+ └── Android App/ # Native Android application
173
+ β”œβ”€β”€ app/ # Android app source code
174
+ β”‚ β”œβ”€β”€ src/main/java/com/jatinmehra/ # Java source files
175
+ β”‚ β”œβ”€β”€ src/main/res/ # Android resources (layouts, drawables, etc.)
176
+ β”‚ └── AndroidManifest.xml # Android app configuration
177
+ β”œβ”€β”€ gradle/ # Gradle build system files
178
+ └── build.gradle # Project build configuration
179
+ ```
180
+
181
+ ### Key Components Description
182
+
183
+ #### Core Application Files
184
+ - **`app.py`**: Main FastAPI application that orchestrates all components and sets up the web server
185
+ - **`gen_dataset.py`**: Comprehensive evaluation script for RAG system performance using the neural-bridge dataset
186
+ - **`test_RAG.ipynb`**: Interactive Jupyter notebook for testing RAG capabilities and analyzing metrics
187
+
188
+ #### API Layer (`api/`)
189
+ - **`chat_routes.py`**: Handles chat interactions, query processing, and conversation flow
190
+ - **`session_routes.py`**: Manages session lifecycle, history retrieval, and cleanup operations
191
+ - **`upload_routes.py`**: Processes PDF uploads, text extraction, and document indexing
192
+ - **`utility_routes.py`**: Provides system utilities like model listing and health checks
193
+
194
+ #### Configuration (`configs/`)
195
+ - **`config.py`**: Centralizes all application settings, API keys, model configurations, and environment variables
196
+
197
+ #### Data Models (`models/`)
198
+ - **`models.py`**: Defines Pydantic models for request/response validation and API documentation
199
+
200
+ #### Business Logic (`services/`)
201
+ - **`llm_service.py`**: Manages language model interactions, prompt engineering, and response generation
202
+ - **`rag_service.py`**: Implements the core RAG pipeline with agentic search capabilities and tool integration
203
+ - **`session_service.py`**: Handles session persistence, chat history, and user context management
204
+
205
+ #### Utilities (`utils/`)
206
+ - **`faiss_utils.py`**: Provides FAISS vector database operations for similarity search and indexing
207
+ - **`session_utils.py`**: Handles session serialization, deserialization, and data persistence
208
+ - **`text_processing.py`**: PDF text extraction, intelligent chunking, and preprocessing utilities
209
+
210
+ #### Frontend (`static/`)
211
+ - **`index.html`**: Responsive web interface with modern UI design
212
+ - **`styles.css`**: CSS styling with mobile-first responsive design principles
213
+ - **`app.js`**: JavaScript for dynamic interactions, file uploads, and chat functionality
214
+
215
+ #### Mobile Application (`Android App/`)
216
+ - **Native Android client**: WebView-based mobile application that interfaces with the web app
217
+ - **Java source code**: Activity management, splash screen, and WebView configuration
218
+ - **Android resources**: UI layouts, icons, and mobile-specific configurations
219
+
220
  ## Technical Stack
221
 
222
  ### Backend
 
352
  ## License
353
 
354
  MIT
355
+
README_hf.md CHANGED
@@ -123,6 +123,110 @@ The application follows a modular architecture with these main components:
123
  - Response is returned to the user
124
  - Chat history is updated and persisted
125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  ## Technical Stack
127
 
128
  ### Backend
 
123
  - Response is returned to the user
124
  - Chat history is updated and persisted
125
 
126
+ ## Project Structure
127
+
128
+ The project is organized into a modular architecture with clear separation of concerns:
129
+
130
+ ```
131
+ PDF-Insight-Beta/
132
+ β”œβ”€β”€ app.py # Main FastAPI application entry point
133
+ β”œβ”€β”€ gen_dataset.py # Dataset generation and RAG evaluation scripts
134
+ β”œβ”€β”€ test_RAG.ipynb # Jupyter notebook for RAG system testing and metrics
135
+ β”œβ”€β”€ requirements.txt # Python dependencies
136
+ β”œβ”€β”€ Dockerfile # Container configuration for deployment
137
+ β”œβ”€β”€ LICENSE # MIT license file
138
+ β”œβ”€β”€ README.md # Project documentation
139
+ β”œβ”€β”€ README_hf.md # Hugging Face Spaces specific documentation
140
+ β”œβ”€β”€
141
+ β”œβ”€β”€ api/ # API route handlers (modular FastAPI routes)
142
+ β”‚ β”œβ”€β”€ __init__.py # Exports all route handlers
143
+ β”‚ β”œβ”€β”€ chat_routes.py # Chat and conversation management endpoints
144
+ β”‚ β”œβ”€β”€ session_routes.py # Session lifecycle management
145
+ β”‚ β”œβ”€β”€ upload_routes.py # PDF upload and processing endpoints
146
+ β”‚ └── utility_routes.py # Utility endpoints (models, health checks)
147
+ β”œβ”€β”€
148
+ β”œβ”€β”€ configs/ # Configuration management
149
+ β”‚ └── config.py # Centralized configuration and environment variables
150
+ β”œβ”€β”€
151
+ β”œβ”€β”€ models/ # Pydantic data models
152
+ β”‚ └── models.py # Request/response models for API validation
153
+ β”œβ”€β”€
154
+ β”œβ”€β”€ services/ # Core business logic services
155
+ β”‚ β”œβ”€β”€ __init__.py # Service module initialization
156
+ β”‚ β”œβ”€β”€ llm_service.py # Language model integration and management
157
+ β”‚ β”œβ”€β”€ rag_service.py # RAG implementation with agentic capabilities
158
+ β”‚ └── session_service.py # Session persistence and management
159
+ β”œβ”€β”€
160
+ β”œβ”€β”€ utils/ # Utility functions and helpers
161
+ β”‚ β”œβ”€β”€ __init__.py # Utility module initialization
162
+ β”‚ β”œβ”€β”€ faiss_utils.py # FAISS vector database operations
163
+ β”‚ β”œβ”€β”€ session_utils.py # Session data serialization/deserialization
164
+ β”‚ └── text_processing.py # PDF text extraction and chunking utilities
165
+ β”œβ”€β”€
166
+ β”œβ”€β”€ static/ # Frontend web application
167
+ β”‚ β”œβ”€β”€ index.html # Main web interface
168
+ β”‚ β”œβ”€β”€ css/
169
+ β”‚ β”‚ └── styles.css # Application styling and responsive design
170
+ β”‚ └── js/
171
+ β”‚ └── app.js # Frontend JavaScript for user interactions
172
+ β”œβ”€β”€
173
+ β”œβ”€β”€ development_scripts/ # Legacy and development utilities
174
+ β”‚ β”œβ”€β”€ app.py # Original monolithic application (deprecated)
175
+ β”‚ └── preprocessing.py # Original preprocessing functions (deprecated)
176
+ β”œβ”€β”€
177
+ β”œβ”€β”€ uploads/ # Temporary storage for uploaded files and sessions
178
+ β”‚ β”œβ”€β”€ *.pdf # Uploaded PDF documents
179
+ β”‚ └── *_session.pkl # Serialized session data
180
+ β”œβ”€β”€
181
+ └── Android App/ # Native Android application
182
+ β”œβ”€β”€ app/ # Android app source code
183
+ β”‚ β”œβ”€β”€ src/main/java/com/jatinmehra/ # Java source files
184
+ β”‚ β”œβ”€β”€ src/main/res/ # Android resources (layouts, drawables, etc.)
185
+ β”‚ └── AndroidManifest.xml # Android app configuration
186
+ β”œβ”€β”€ gradle/ # Gradle build system files
187
+ └── build.gradle # Project build configuration
188
+ ```
189
+
190
+ ### Key Components Description
191
+
192
+ #### Core Application Files
193
+ - **`app.py`**: Main FastAPI application that orchestrates all components and sets up the web server
194
+ - **`gen_dataset.py`**: Comprehensive evaluation script for RAG system performance using the neural-bridge dataset
195
+ - **`test_RAG.ipynb`**: Interactive Jupyter notebook for testing RAG capabilities and analyzing metrics
196
+
197
+ #### API Layer (`api/`)
198
+ - **`chat_routes.py`**: Handles chat interactions, query processing, and conversation flow
199
+ - **`session_routes.py`**: Manages session lifecycle, history retrieval, and cleanup operations
200
+ - **`upload_routes.py`**: Processes PDF uploads, text extraction, and document indexing
201
+ - **`utility_routes.py`**: Provides system utilities like model listing and health checks
202
+
203
+ #### Configuration (`configs/`)
204
+ - **`config.py`**: Centralizes all application settings, API keys, model configurations, and environment variables
205
+
206
+ #### Data Models (`models/`)
207
+ - **`models.py`**: Defines Pydantic models for request/response validation and API documentation
208
+
209
+ #### Business Logic (`services/`)
210
+ - **`llm_service.py`**: Manages language model interactions, prompt engineering, and response generation
211
+ - **`rag_service.py`**: Implements the core RAG pipeline with agentic search capabilities and tool integration
212
+ - **`session_service.py`**: Handles session persistence, chat history, and user context management
213
+
214
+ #### Utilities (`utils/`)
215
+ - **`faiss_utils.py`**: Provides FAISS vector database operations for similarity search and indexing
216
+ - **`session_utils.py`**: Handles session serialization, deserialization, and data persistence
217
+ - **`text_processing.py`**: PDF text extraction, intelligent chunking, and preprocessing utilities
218
+
219
+ #### Frontend (`static/`)
220
+ - **`index.html`**: Responsive web interface with modern UI design
221
+ - **`styles.css`**: CSS styling with mobile-first responsive design principles
222
+ - **`app.js`**: JavaScript for dynamic interactions, file uploads, and chat functionality
223
+
224
+ #### Mobile Application (`Android App/`)
225
+ - **Native Android client**: WebView-based mobile application that interfaces with the web app
226
+ - **Java source code**: Activity management, splash screen, and WebView configuration
227
+ - **Android resources**: UI layouts, icons, and mobile-specific configurations
228
+
229
+
230
  ## Technical Stack
231
 
232
  ### Backend