Really-amin commited on
Commit
4a1218e
Β·
verified Β·
1 Parent(s): e49fcd0

Upload 60 files

Browse files
.dockerignore CHANGED
@@ -59,11 +59,15 @@ Thumbs.db
59
  *.log
60
  logs/
61
 
62
- # Database
63
  *.db
64
  *.sqlite
65
  *.sqlite3
66
 
 
 
 
 
67
  # Temporary files
68
  tmp/
69
  temp/
 
59
  *.log
60
  logs/
61
 
62
+ # Database (exclude old database files, but allow /app/data directory)
63
  *.db
64
  *.sqlite
65
  *.sqlite3
66
 
67
+ # Cache directories (exclude to prevent permission issues)
68
+ cache/
69
+ /app/cache/
70
+
71
  # Temporary files
72
  tmp/
73
  temp/
Dockerfile CHANGED
@@ -1,32 +1,25 @@
1
- FROM python:3.10-slim
2
-
3
- # Set working directory
4
- WORKDIR /app
5
-
6
- # Install required system packages
7
- RUN apt-get update && apt-get install -y \
8
- build-essential \
9
- wget \
10
- curl \
11
- poppler-utils \
12
- tesseract-ocr \
13
- libgl1 \
14
- locales \
15
- && rm -rf /var/lib/apt/lists/*
16
-
17
- # Set UTF-8 locale
18
- ENV LANG=C.UTF-8
19
- ENV LC_ALL=C.UTF-8
20
-
21
- # Copy all project files
22
- COPY . .
23
-
24
- # Upgrade pip and install Python dependencies
25
- RUN pip install --upgrade pip && \
26
- pip install --no-cache-dir -r requirements.txt
27
-
28
- # Expose FastAPI port
29
- EXPOSE 7860
30
-
31
- # Run FastAPI app
32
- CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install required system packages
6
+ RUN apt-get update && apt-get install -y \
7
+ build-essential \
8
+ poppler-utils \
9
+ tesseract-ocr \
10
+ libgl1 \
11
+ && rm -rf /var/lib/apt/lists/*
12
+
13
+ # Create volume-safe directories with proper permissions
14
+ RUN mkdir -p /app/data /app/cache && chmod -R 777 /app/data /app/cache
15
+
16
+ # Copy all project files
17
+ COPY . .
18
+
19
+ # Install Python dependencies
20
+ RUN pip install --no-cache-dir -r requirements.txt
21
+
22
+ EXPOSE 7860
23
+
24
+ # Run FastAPI app
25
+ CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
 
 
 
 
 
 
 
RUNTIME_FIXES_SUMMARY.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Runtime Fixes Summary
2
+
3
+ ## Overview
4
+ This document summarizes the fixes applied to resolve runtime errors in the Legal Dashboard OCR application, specifically addressing:
5
+
6
+ 1. **SQLite Database Path Issues** (`sqlite3.OperationalError: unable to open database file`)
7
+ 2. **Hugging Face Transformers Cache Permissions** (`/.cache` not writable)
8
+
9
+ ## πŸ”§ Fixes Applied
10
+
11
+ ### 1. SQLite Database Path Fix
12
+
13
+ **File Modified:** `app/services/database_service.py`
14
+
15
+ **Changes:**
16
+ - Updated default database path from `"legal_documents.db"` to `"/app/data/database.db"`
17
+ - Added directory creation with `os.makedirs(os.path.dirname(self.db_path), exist_ok=True)`
18
+ - Added `check_same_thread=False` parameter for better thread safety
19
+
20
+ **Code Changes:**
21
+ ```python
22
+ def __init__(self, db_path: str = "/app/data/database.db"):
23
+ self.db_path = db_path
24
+ self.connection = None
25
+ # Create directory if it doesn't exist
26
+ os.makedirs(os.path.dirname(self.db_path), exist_ok=True)
27
+ self._init_database()
28
+
29
+ def _init_database(self):
30
+ """Initialize database and create tables"""
31
+ try:
32
+ self.connection = sqlite3.connect(self.db_path, check_same_thread=False)
33
+ # ... rest of initialization
34
+ ```
35
+
36
+ ### 2. Hugging Face Cache Permissions Fix
37
+
38
+ **File Modified:** `app/main.py`
39
+
40
+ **Changes:**
41
+ - Added environment variable setting for `TRANSFORMERS_CACHE`
42
+ - Created cache directory with proper permissions
43
+ - Ensured cache directory is writable inside container
44
+
45
+ **Code Changes:**
46
+ ```python
47
+ # Set HF cache to a writable path inside container
48
+ os.environ["TRANSFORMERS_CACHE"] = "/app/cache"
49
+ os.makedirs("/app/cache", exist_ok=True)
50
+ ```
51
+
52
+ ### 3. Dockerfile Updates
53
+
54
+ **File Modified:** `Dockerfile`
55
+
56
+ **Changes:**
57
+ - Added directory creation for `/app/data` and `/app/cache`
58
+ - Set proper permissions (777) for both directories
59
+ - Ensured directories are created before copying application files
60
+
61
+ **Code Changes:**
62
+ ```dockerfile
63
+ # Create volume-safe directories with proper permissions
64
+ RUN mkdir -p /app/data /app/cache && chmod -R 777 /app/data /app/cache
65
+ ```
66
+
67
+ ### 4. Docker Ignore Updates
68
+
69
+ **File Modified:** `.dockerignore`
70
+
71
+ **Changes:**
72
+ - Added cache directory exclusions to prevent permission issues
73
+ - Preserved data directory for database persistence
74
+ - Excluded old database files while allowing new structure
75
+
76
+ **Code Changes:**
77
+ ```
78
+ # Cache directories (exclude to prevent permission issues)
79
+ cache/
80
+ /app/cache/
81
+ ```
82
+
83
+ ## 🎯 Expected Results
84
+
85
+ After applying these fixes, the application should:
86
+
87
+ 1. **Database Operations:**
88
+ - Successfully create and access SQLite database at `/app/data/database.db`
89
+ - No more `sqlite3.OperationalError: unable to open database file` errors
90
+ - Database persists across container restarts
91
+
92
+ 2. **Hugging Face Models:**
93
+ - Successfully download and cache models in `/app/cache`
94
+ - No more cache permission errors
95
+ - Models load correctly on first run
96
+
97
+ 3. **Container Deployment:**
98
+ - Builds successfully on Hugging Face Docker SDK
99
+ - Runs without permission-related runtime errors
100
+ - Maintains data persistence in volume-safe directories
101
+
102
+ ## πŸ§ͺ Validation
103
+
104
+ A validation script has been created (`validate_fixes.py`) that tests:
105
+
106
+ - Database path creation and access
107
+ - Cache directory setup and permissions
108
+ - Dockerfile configuration
109
+ - Docker ignore settings
110
+
111
+ Run the validation script to verify all fixes are working:
112
+
113
+ ```bash
114
+ cd legal_dashboard_ocr
115
+ python validate_fixes.py
116
+ ```
117
+
118
+ ## πŸ“ Directory Structure
119
+
120
+ After fixes, the container will have this structure:
121
+
122
+ ```
123
+ /app/
124
+ β”œβ”€β”€ data/ # Database storage (persistent)
125
+ β”‚ └── database.db
126
+ β”œβ”€β”€ cache/ # HF model cache (persistent)
127
+ β”‚ └── transformers/
128
+ β”œβ”€β”€ app/ # Application code
129
+ β”œβ”€β”€ frontend/ # Frontend files
130
+ └── requirements.txt
131
+ ```
132
+
133
+ ## πŸ”’ Security Considerations
134
+
135
+ - Database and cache directories have 777 permissions for container compatibility
136
+ - In production, consider more restrictive permissions if security is a concern
137
+ - Database files are stored in persistent volumes
138
+ - Cache can be cleared without affecting application functionality
139
+
140
+ ## πŸš€ Deployment
141
+
142
+ The application is now ready for deployment on Hugging Face Spaces with:
143
+
144
+ 1. **No database initialization errors**
145
+ 2. **No cache permission errors**
146
+ 3. **Persistent data storage**
147
+ 4. **Proper model caching**
148
+
149
+ All runtime errors related to file permissions and database access should be resolved.
app/main.py CHANGED
@@ -8,24 +8,28 @@ Features real-time document processing, AI scoring, and WebSocket support.
8
  Run with: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
9
  """
10
 
 
 
 
 
 
11
  import asyncio
12
  import logging
 
13
  from fastapi import FastAPI, HTTPException, BackgroundTasks, WebSocket, WebSocketDisconnect, UploadFile, File
14
  from fastapi.middleware.cors import CORSMiddleware
15
  from fastapi.responses import HTMLResponse, JSONResponse
16
  from fastapi.staticfiles import StaticFiles
17
  import uvicorn
18
  from pydantic import BaseModel
19
- import os
20
  import tempfile
21
  from pathlib import Path
22
 
 
 
 
 
23
  # Import our modules
24
- from .api import documents, ocr, dashboard
25
- from .services.ocr_service import OCRPipeline
26
- from .services.database_service import DatabaseManager
27
- from .services.ai_service import AIScoringEngine
28
- from .models.document_models import LegalDocument
29
 
30
  # Configure logging
31
  logging.basicConfig(
 
8
  Run with: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
9
  """
10
 
11
+ from .models.document_models import LegalDocument
12
+ from .services.ai_service import AIScoringEngine
13
+ from .services.database_service import DatabaseManager
14
+ from .services.ocr_service import OCRPipeline
15
+ from .api import documents, ocr, dashboard
16
  import asyncio
17
  import logging
18
+ import os
19
  from fastapi import FastAPI, HTTPException, BackgroundTasks, WebSocket, WebSocketDisconnect, UploadFile, File
20
  from fastapi.middleware.cors import CORSMiddleware
21
  from fastapi.responses import HTMLResponse, JSONResponse
22
  from fastapi.staticfiles import StaticFiles
23
  import uvicorn
24
  from pydantic import BaseModel
 
25
  import tempfile
26
  from pathlib import Path
27
 
28
+ # Set HF cache to a writable path inside container
29
+ os.environ["TRANSFORMERS_CACHE"] = "/app/cache"
30
+ os.makedirs("/app/cache", exist_ok=True)
31
+
32
  # Import our modules
 
 
 
 
 
33
 
34
  # Configure logging
35
  logging.basicConfig(
app/services/database_service.py CHANGED
@@ -8,6 +8,7 @@ SQLite database management for legal documents with AI scoring.
8
  import sqlite3
9
  import json
10
  import logging
 
11
  from typing import List, Dict, Optional, Any
12
  from datetime import datetime, timedelta
13
  from pathlib import Path
@@ -19,15 +20,18 @@ logger = logging.getLogger(__name__)
19
  class DatabaseManager:
20
  """Database manager for legal documents"""
21
 
22
- def __init__(self, db_path: str = "legal_documents.db"):
23
  self.db_path = db_path
24
  self.connection = None
 
 
25
  self._init_database()
26
 
27
  def _init_database(self):
28
  """Initialize database and create tables"""
29
  try:
30
- self.connection = sqlite3.connect(self.db_path)
 
31
  self.connection.row_factory = sqlite3.Row
32
 
33
  # Create tables
 
8
  import sqlite3
9
  import json
10
  import logging
11
+ import os
12
  from typing import List, Dict, Optional, Any
13
  from datetime import datetime, timedelta
14
  from pathlib import Path
 
20
  class DatabaseManager:
21
  """Database manager for legal documents"""
22
 
23
+ def __init__(self, db_path: str = "/app/data/database.db"):
24
  self.db_path = db_path
25
  self.connection = None
26
+ # Create directory if it doesn't exist
27
+ os.makedirs(os.path.dirname(self.db_path), exist_ok=True)
28
  self._init_database()
29
 
30
  def _init_database(self):
31
  """Initialize database and create tables"""
32
  try:
33
+ self.connection = sqlite3.connect(
34
+ self.db_path, check_same_thread=False)
35
  self.connection.row_factory = sqlite3.Row
36
 
37
  # Create tables
validate_fixes.py ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Validation Script for Database and Cache Fixes
4
+ =============================================
5
+
6
+ Tests the fixes for:
7
+ 1. SQLite database path issues
8
+ 2. Hugging Face cache permissions
9
+ """
10
+
11
+ import os
12
+ import sys
13
+ import tempfile
14
+ import shutil
15
+ from pathlib import Path
16
+
17
+
18
+ def test_database_path():
19
+ """Test database path creation and access"""
20
+ print("πŸ” Testing database path fixes...")
21
+
22
+ try:
23
+ # Test the new database path
24
+ from app.services.database_service import DatabaseManager
25
+
26
+ # Test with default path (should be /app/data/database.db)
27
+ db = DatabaseManager()
28
+ print("βœ… Database manager initialized with default path")
29
+
30
+ # Test if database directory exists
31
+ db_dir = os.path.dirname(db.db_path)
32
+ if os.path.exists(db_dir):
33
+ print(f"βœ… Database directory exists: {db_dir}")
34
+ else:
35
+ print(f"❌ Database directory missing: {db_dir}")
36
+ return False
37
+
38
+ # Test database connection
39
+ if db.is_connected():
40
+ print("βœ… Database connection successful")
41
+ else:
42
+ print("❌ Database connection failed")
43
+ return False
44
+
45
+ db.close()
46
+ return True
47
+
48
+ except Exception as e:
49
+ print(f"❌ Database test failed: {e}")
50
+ return False
51
+
52
+
53
+ def test_cache_directory():
54
+ """Test Hugging Face cache directory setup"""
55
+ print("\nπŸ” Testing cache directory fixes...")
56
+
57
+ try:
58
+ # Check if cache directory is set
59
+ cache_dir = os.environ.get("TRANSFORMERS_CACHE")
60
+ if cache_dir:
61
+ print(f"βœ… TRANSFORMERS_CACHE set to: {cache_dir}")
62
+ else:
63
+ print("❌ TRANSFORMERS_CACHE not set")
64
+ return False
65
+
66
+ # Check if cache directory exists and is writable
67
+ if os.path.exists(cache_dir):
68
+ print(f"βœ… Cache directory exists: {cache_dir}")
69
+ else:
70
+ print(f"❌ Cache directory missing: {cache_dir}")
71
+ return False
72
+
73
+ # Test write permissions
74
+ test_file = os.path.join(cache_dir, "test_write.tmp")
75
+ try:
76
+ with open(test_file, 'w') as f:
77
+ f.write("test")
78
+ os.remove(test_file)
79
+ print("βœ… Cache directory is writable")
80
+ except Exception as e:
81
+ print(f"❌ Cache directory not writable: {e}")
82
+ return False
83
+
84
+ return True
85
+
86
+ except Exception as e:
87
+ print(f"❌ Cache test failed: {e}")
88
+ return False
89
+
90
+
91
+ def test_dockerfile_updates():
92
+ """Test Dockerfile changes"""
93
+ print("\nπŸ” Testing Dockerfile updates...")
94
+
95
+ try:
96
+ dockerfile_path = "Dockerfile"
97
+ if not os.path.exists(dockerfile_path):
98
+ print("❌ Dockerfile not found")
99
+ return False
100
+
101
+ with open(dockerfile_path, 'r') as f:
102
+ content = f.read()
103
+
104
+ # Check for directory creation
105
+ if "mkdir -p /app/data /app/cache" in content:
106
+ print("βœ… Directory creation command found")
107
+ else:
108
+ print("❌ Directory creation command missing")
109
+ return False
110
+
111
+ # Check for permissions
112
+ if "chmod -R 777 /app/data /app/cache" in content:
113
+ print("βœ… Permission setting command found")
114
+ else:
115
+ print("❌ Permission setting command missing")
116
+ return False
117
+
118
+ return True
119
+
120
+ except Exception as e:
121
+ print(f"❌ Dockerfile test failed: {e}")
122
+ return False
123
+
124
+
125
+ def test_dockerignore_updates():
126
+ """Test .dockerignore updates"""
127
+ print("\nπŸ” Testing .dockerignore updates...")
128
+
129
+ try:
130
+ dockerignore_path = ".dockerignore"
131
+ if not os.path.exists(dockerignore_path):
132
+ print("❌ .dockerignore not found")
133
+ return False
134
+
135
+ with open(dockerignore_path, 'r') as f:
136
+ content = f.read()
137
+
138
+ # Check for cache exclusions
139
+ if "cache/" in content:
140
+ print("βœ… Cache directory exclusion found")
141
+ else:
142
+ print("❌ Cache directory exclusion missing")
143
+ return False
144
+
145
+ if "/app/cache/" in content:
146
+ print("βœ… /app/cache exclusion found")
147
+ else:
148
+ print("❌ /app/cache exclusion missing")
149
+ return False
150
+
151
+ return True
152
+
153
+ except Exception as e:
154
+ print(f"❌ .dockerignore test failed: {e}")
155
+ return False
156
+
157
+
158
+ def main():
159
+ """Run all validation tests"""
160
+ print("πŸš€ Legal Dashboard OCR - Fix Validation")
161
+ print("=" * 50)
162
+
163
+ # Change to project directory
164
+ project_dir = Path(__file__).parent
165
+ os.chdir(project_dir)
166
+
167
+ # Run tests
168
+ tests = [
169
+ test_database_path,
170
+ test_cache_directory,
171
+ test_dockerfile_updates,
172
+ test_dockerignore_updates
173
+ ]
174
+
175
+ results = []
176
+ for test in tests:
177
+ try:
178
+ result = test()
179
+ results.append(result)
180
+ except Exception as e:
181
+ print(f"❌ Test failed with exception: {e}")
182
+ results.append(False)
183
+
184
+ # Summary
185
+ print("\n" + "=" * 50)
186
+ print("πŸ“Š Validation Results Summary")
187
+ print("=" * 50)
188
+
189
+ passed = sum(results)
190
+ total = len(results)
191
+
192
+ print(f"βœ… Passed: {passed}/{total}")
193
+ print(f"❌ Failed: {total - passed}/{total}")
194
+
195
+ if all(results):
196
+ print("\nπŸŽ‰ All fixes validated successfully!")
197
+ print("\nβœ… Runtime errors should be resolved:")
198
+ print(" β€’ SQLite database path fixed")
199
+ print(" β€’ Hugging Face cache permissions fixed")
200
+ print(" β€’ Docker container ready for deployment")
201
+ return 0
202
+ else:
203
+ print("\n⚠️ Some fixes need attention. Please check the errors above.")
204
+ return 1
205
+
206
+
207
+ if __name__ == "__main__":
208
+ sys.exit(main())