Spaces:
Paused
Paused
# Legal Dashboard OCR - Deployment Instructions | |
## π Quick Start | |
### 1. Local Development Setup | |
```bash | |
# Clone or navigate to the project | |
cd legal_dashboard_ocr | |
# Install dependencies | |
pip install -r requirements.txt | |
# Set environment variables | |
export HF_TOKEN="your_huggingface_token" | |
# Run the application | |
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload | |
``` | |
### 2. Access the Application | |
- **Web Dashboard**: http://localhost:8000 | |
- **API Documentation**: http://localhost:8000/docs | |
- **Health Check**: http://localhost:8000/health | |
## π¦ Project Structure | |
``` | |
legal_dashboard_ocr/ | |
βββ README.md # Main documentation | |
βββ requirements.txt # Python dependencies | |
βββ test_structure.py # Structure verification | |
βββ DEPLOYMENT_INSTRUCTIONS.md # This file | |
βββ app/ # Backend application | |
β βββ __init__.py | |
β βββ main.py # FastAPI entry point | |
β βββ api/ # API routes | |
β β βββ __init__.py | |
β β βββ documents.py # Document CRUD | |
β β βββ ocr.py # OCR processing | |
β β βββ dashboard.py # Dashboard analytics | |
β βββ services/ # Business logic | |
β β βββ __init__.py | |
β β βββ ocr_service.py # OCR pipeline | |
β β βββ database_service.py # Database operations | |
β β βββ ai_service.py # AI scoring | |
β βββ models/ # Data models | |
β βββ __init__.py | |
β βββ document_models.py # Pydantic schemas | |
βββ frontend/ # Web interface | |
β βββ improved_legal_dashboard.html | |
β βββ test_integration.html | |
βββ tests/ # Test suite | |
β βββ test_api_endpoints.py | |
β βββ test_ocr_pipeline.py | |
βββ data/ # Sample documents | |
β βββ sample_persian.pdf | |
βββ huggingface_space/ # HF Space deployment | |
βββ app.py # Gradio interface | |
βββ Spacefile # Deployment config | |
βββ README.md # Space documentation | |
``` | |
## π§ Configuration | |
### Environment Variables | |
Create a `.env` file in the project root: | |
```env | |
# Hugging Face Token (required for OCR models) | |
HF_TOKEN=your_huggingface_token_here | |
# Database configuration (optional) | |
DATABASE_URL=sqlite:///legal_documents.db | |
# Server configuration (optional) | |
HOST=0.0.0.0 | |
PORT=8000 | |
DEBUG=true | |
``` | |
### Hugging Face Token | |
1. Go to https://huggingface.co/settings/tokens | |
2. Create a new token with read permissions | |
3. Add it to your environment variables | |
## π§ͺ Testing | |
### Run Structure Test | |
```bash | |
python test_structure.py | |
``` | |
### Run API Tests | |
```bash | |
# Install test dependencies | |
pip install pytest pytest-asyncio | |
# Run tests | |
python -m pytest tests/ | |
``` | |
### Manual Testing | |
```bash | |
# Test OCR endpoint | |
curl -X POST "http://localhost:8000/api/ocr/process" \ | |
-H "Content-Type: multipart/form-data" \ | |
-F "file=@data/sample_persian.pdf" | |
# Test dashboard | |
curl "http://localhost:8000/api/dashboard/summary" | |
``` | |
## π Deployment Options | |
### 1. Hugging Face Spaces | |
#### Automatic Deployment | |
1. Create a new Space on Hugging Face | |
2. Upload all files from `huggingface_space/` directory | |
3. Set the `HF_TOKEN` environment variable in Space settings | |
4. The Space will automatically build and deploy | |
#### Manual Deployment | |
```bash | |
# Navigate to HF Space directory | |
cd huggingface_space | |
# Install dependencies | |
pip install -r ../requirements.txt | |
# Run the Gradio app | |
python app.py | |
``` | |
### 2. Docker Deployment | |
#### Create Dockerfile | |
```dockerfile | |
FROM python:3.10-slim | |
WORKDIR /app | |
# Install system dependencies | |
RUN apt-get update && apt-get install -y \ | |
build-essential \ | |
&& rm -rf /var/lib/apt/lists/* | |
# Copy requirements and install Python dependencies | |
COPY requirements.txt . | |
RUN pip install --no-cache-dir -r requirements.txt | |
# Copy application code | |
COPY . . | |
# Expose port | |
EXPOSE 8000 | |
# Run the application | |
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] | |
``` | |
#### Build and Run | |
```bash | |
# Build Docker image | |
docker build -t legal-dashboard-ocr . | |
# Run container | |
docker run -p 8000:8000 \ | |
-e HF_TOKEN=your_token \ | |
legal-dashboard-ocr | |
``` | |
### 3. Production Deployment | |
#### Using Gunicorn | |
```bash | |
# Install gunicorn | |
pip install gunicorn | |
# Run with multiple workers | |
gunicorn app.main:app \ | |
--workers 4 \ | |
--worker-class uvicorn.workers.UvicornWorker \ | |
--bind 0.0.0.0:8000 | |
``` | |
#### Using Nginx (Reverse Proxy) | |
```nginx | |
server { | |
listen 80; | |
server_name your-domain.com; | |
location / { | |
proxy_pass http://127.0.0.1:8000; | |
proxy_set_header Host $host; | |
proxy_set_header X-Real-IP $remote_addr; | |
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; | |
proxy_set_header X-Forwarded-Proto $scheme; | |
} | |
} | |
``` | |
## π Troubleshooting | |
### Common Issues | |
#### 1. Import Errors | |
```bash | |
# Ensure you're in the correct directory | |
cd legal_dashboard_ocr | |
# Install dependencies | |
pip install -r requirements.txt | |
# Check Python path | |
python -c "import sys; print(sys.path)" | |
``` | |
#### 2. OCR Model Loading Issues | |
```bash | |
# Check HF token | |
echo $HF_TOKEN | |
# Test model download | |
python -c "from transformers import pipeline; p = pipeline('image-to-text', 'microsoft/trocr-base-stage1')" | |
``` | |
#### 3. Database Issues | |
```bash | |
# Check database file | |
ls -la legal_documents.db | |
# Reset database (if needed) | |
rm legal_documents.db | |
``` | |
#### 4. Port Already in Use | |
```bash | |
# Find process using port 8000 | |
lsof -i :8000 | |
# Kill process | |
kill -9 <PID> | |
# Or use different port | |
uvicorn app.main:app --port 8001 | |
``` | |
### Performance Optimization | |
#### 1. Model Caching | |
```python | |
# In app/services/ocr_service.py | |
# Models are automatically cached by Hugging Face | |
# Cache location: ~/.cache/huggingface/ | |
``` | |
#### 2. Database Optimization | |
```sql | |
-- Add indexes for better performance | |
CREATE INDEX idx_documents_category ON documents(category); | |
CREATE INDEX idx_documents_status ON documents(status); | |
CREATE INDEX idx_documents_created_at ON documents(created_at); | |
``` | |
#### 3. Memory Management | |
```python | |
# In app/main.py | |
# Configure memory limits | |
import gc | |
gc.collect() # Force garbage collection | |
``` | |
## π Monitoring | |
### Health Check | |
```bash | |
curl http://localhost:8000/health | |
``` | |
### API Documentation | |
- Swagger UI: http://localhost:8000/docs | |
- ReDoc: http://localhost:8000/redoc | |
### Logs | |
```bash | |
# View application logs | |
tail -f logs/app.log | |
# View error logs | |
grep ERROR logs/app.log | |
``` | |
## π Security | |
### Production Checklist | |
- [ ] Set `DEBUG=false` in production | |
- [ ] Use HTTPS in production | |
- [ ] Implement rate limiting | |
- [ ] Add authentication/authorization | |
- [ ] Secure file upload validation | |
- [ ] Regular security updates | |
### Environment Security | |
```bash | |
# Secure environment variables | |
export HF_TOKEN="your_secure_token" | |
export DATABASE_URL="your_secure_db_url" | |
# Use .env file (don't commit to git) | |
echo "HF_TOKEN=your_token" > .env | |
echo ".env" >> .gitignore | |
``` | |
## π Scaling | |
### Horizontal Scaling | |
```bash | |
# Run multiple instances | |
uvicorn app.main:app --host 0.0.0.0 --port 8000 & | |
uvicorn app.main:app --host 0.0.0.0 --port 8001 & | |
uvicorn app.main:app --host 0.0.0.0 --port 8002 & | |
``` | |
### Load Balancing | |
```nginx | |
upstream legal_dashboard { | |
server 127.0.0.1:8000; | |
server 127.0.0.1:8001; | |
server 127.0.0.1:8002; | |
} | |
server { | |
listen 80; | |
location / { | |
proxy_pass http://legal_dashboard; | |
} | |
} | |
``` | |
## π Support | |
### Getting Help | |
1. Check the logs for error messages | |
2. Verify environment variables are set | |
3. Test with the sample PDF in `data/` | |
4. Check the API documentation at `/docs` | |
### Common Commands | |
```bash | |
# Start development server | |
uvicorn app.main:app --reload | |
# Run tests | |
python -m pytest tests/ | |
# Check structure | |
python test_structure.py | |
# View API docs | |
open http://localhost:8000/docs | |
``` | |
## π― Next Steps | |
1. **Deploy to Hugging Face Spaces** for easy sharing | |
2. **Add authentication** for production use | |
3. **Implement user management** for multi-user support | |
4. **Add more OCR models** for different document types | |
5. **Create mobile app** for document scanning | |
6. **Add batch processing** for multiple documents | |
7. **Implement advanced analytics** and reporting | |
--- | |
**Note**: This project is designed for Persian legal documents. Ensure your documents are clear and well-scanned for best OCR results. |