Legal Dashboard OCR - Deployment Instructions

🚀 Quick Start

1. Local Development Setup

# Clone or navigate to the project
cd legal_dashboard_ocr

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export HF_TOKEN="your_huggingface_token"

# Run the application
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

2. Access the Application

Web Dashboard: http://localhost:8000
API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health

📦 Project Structure

legal_dashboard_ocr/
├── README.md                    # Main documentation
├── requirements.txt             # Python dependencies
├── test_structure.py           # Structure verification
├── DEPLOYMENT_INSTRUCTIONS.md  # This file
├── app/                        # Backend application
│   ├── __init__.py
│   ├── main.py                 # FastAPI entry point
│   ├── api/                    # API routes
│   │   ├── __init__.py
│   │   ├── documents.py        # Document CRUD
│   │   ├── ocr.py             # OCR processing
│   │   └── dashboard.py       # Dashboard analytics
│   ├── services/               # Business logic
│   │   ├── __init__.py
│   │   ├── ocr_service.py     # OCR pipeline
│   │   ├── database_service.py # Database operations
│   │   └── ai_service.py      # AI scoring
│   └── models/                 # Data models
│       ├── __init__.py
│       └── document_models.py  # Pydantic schemas
├── frontend/                   # Web interface
│   ├── improved_legal_dashboard.html
│   └── test_integration.html
├── tests/                      # Test suite
│   ├── test_api_endpoints.py
│   └── test_ocr_pipeline.py
├── data/                       # Sample documents
│   └── sample_persian.pdf
└── huggingface_space/          # HF Space deployment
    ├── app.py                  # Gradio interface
    ├── Spacefile               # Deployment config
    └── README.md               # Space documentation

🔧 Configuration

Environment Variables

Create a .env file in the project root:

# Hugging Face Token (required for OCR models)
HF_TOKEN=your_huggingface_token_here

# Database configuration (optional)
DATABASE_URL=sqlite:///legal_documents.db

# Server configuration (optional)
HOST=0.0.0.0
PORT=8000
DEBUG=true

Hugging Face Token

Go to https://huggingface.co/settings/tokens
Create a new token with read permissions
Add it to your environment variables

🧪 Testing

Run Structure Test

python test_structure.py

Run API Tests

# Install test dependencies
pip install pytest pytest-asyncio

# Run tests
python -m pytest tests/

Manual Testing

# Test OCR endpoint
curl -X POST "http://localhost:8000/api/ocr/process" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@data/sample_persian.pdf"

# Test dashboard
curl "http://localhost:8000/api/dashboard/summary"

🚀 Deployment Options

1. Hugging Face Spaces

Automatic Deployment

Create a new Space on Hugging Face
Upload all files from huggingface_space/ directory
Set the HF_TOKEN environment variable in Space settings
The Space will automatically build and deploy

Manual Deployment

# Navigate to HF Space directory
cd huggingface_space

# Install dependencies
pip install -r ../requirements.txt

# Run the Gradio app
python app.py

2. Docker Deployment

Create Dockerfile

FROM python:3.10-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8000

# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and Run

# Build Docker image
docker build -t legal-dashboard-ocr .

# Run container
docker run -p 8000:8000 \
  -e HF_TOKEN=your_token \
  legal-dashboard-ocr

3. Production Deployment

Using Gunicorn

# Install gunicorn
pip install gunicorn

# Run with multiple workers
gunicorn app.main:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000

Using Nginx (Reverse Proxy)

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

🔍 Troubleshooting

Common Issues

1. Import Errors

# Ensure you're in the correct directory
cd legal_dashboard_ocr

# Install dependencies
pip install -r requirements.txt

# Check Python path
python -c "import sys; print(sys.path)"

2. OCR Model Loading Issues

# Check HF token
echo $HF_TOKEN

# Test model download
python -c "from transformers import pipeline; p = pipeline('image-to-text', 'microsoft/trocr-base-stage1')"

3. Database Issues

# Check database file
ls -la legal_documents.db

# Reset database (if needed)
rm legal_documents.db

4. Port Already in Use

# Find process using port 8000
lsof -i :8000

# Kill process
kill -9 <PID>

# Or use different port
uvicorn app.main:app --port 8001

Performance Optimization

1. Model Caching

# In app/services/ocr_service.py
# Models are automatically cached by Hugging Face
# Cache location: ~/.cache/huggingface/

2. Database Optimization

-- Add indexes for better performance
CREATE INDEX idx_documents_category ON documents(category);
CREATE INDEX idx_documents_status ON documents(status);
CREATE INDEX idx_documents_created_at ON documents(created_at);

3. Memory Management

# In app/main.py
# Configure memory limits
import gc
gc.collect()  # Force garbage collection

📊 Monitoring

Health Check

curl http://localhost:8000/health

API Documentation

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Logs

# View application logs
tail -f logs/app.log

# View error logs
grep ERROR logs/app.log

🔒 Security

Production Checklist

Set DEBUG=false in production
Use HTTPS in production
Implement rate limiting
Add authentication/authorization
Secure file upload validation
Regular security updates

Environment Security

# Secure environment variables
export HF_TOKEN="your_secure_token"
export DATABASE_URL="your_secure_db_url"

# Use .env file (don't commit to git)
echo "HF_TOKEN=your_token" > .env
echo ".env" >> .gitignore

📈 Scaling

Horizontal Scaling

# Run multiple instances
uvicorn app.main:app --host 0.0.0.0 --port 8000 &
uvicorn app.main:app --host 0.0.0.0 --port 8001 &
uvicorn app.main:app --host 0.0.0.0 --port 8002 &

Load Balancing

upstream legal_dashboard {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
}

server {
    listen 80;
    location / {
        proxy_pass http://legal_dashboard;
    }
}

🆘 Support

Getting Help

Check the logs for error messages
Verify environment variables are set
Test with the sample PDF in data/
Check the API documentation at /docs

Common Commands

# Start development server
uvicorn app.main:app --reload

# Run tests
python -m pytest tests/

# Check structure
python test_structure.py

# View API docs
open http://localhost:8000/docs

🎯 Next Steps

Deploy to Hugging Face Spaces for easy sharing
Add authentication for production use
Implement user management for multi-user support
Add more OCR models for different document types
Create mobile app for document scanning
Add batch processing for multiple documents
Implement advanced analytics and reporting

Note: This project is designed for Persian legal documents. Ensure your documents are clear and well-scanned for best OCR results.