Hoghoghi / DEPLOYMENT_INSTRUCTIONS.md
Really-amin's picture
Upload 46 files
922c3ba verified

A newer version of the Gradio SDK is available: 5.45.0

Upgrade

Legal Dashboard OCR - Deployment Instructions

πŸš€ Quick Start

1. Local Development Setup

# Clone or navigate to the project
cd legal_dashboard_ocr

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export HF_TOKEN="your_huggingface_token"

# Run the application
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

2. Access the Application

πŸ“¦ Project Structure

legal_dashboard_ocr/
β”œβ”€β”€ README.md                    # Main documentation
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ test_structure.py           # Structure verification
β”œβ”€β”€ DEPLOYMENT_INSTRUCTIONS.md  # This file
β”œβ”€β”€ app/                        # Backend application
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                 # FastAPI entry point
β”‚   β”œβ”€β”€ api/                    # API routes
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ documents.py        # Document CRUD
β”‚   β”‚   β”œβ”€β”€ ocr.py             # OCR processing
β”‚   β”‚   └── dashboard.py       # Dashboard analytics
β”‚   β”œβ”€β”€ services/               # Business logic
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ ocr_service.py     # OCR pipeline
β”‚   β”‚   β”œβ”€β”€ database_service.py # Database operations
β”‚   β”‚   └── ai_service.py      # AI scoring
β”‚   └── models/                 # Data models
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── document_models.py  # Pydantic schemas
β”œβ”€β”€ frontend/                   # Web interface
β”‚   β”œβ”€β”€ improved_legal_dashboard.html
β”‚   └── test_integration.html
β”œβ”€β”€ tests/                      # Test suite
β”‚   β”œβ”€β”€ test_api_endpoints.py
β”‚   └── test_ocr_pipeline.py
β”œβ”€β”€ data/                       # Sample documents
β”‚   └── sample_persian.pdf
└── huggingface_space/          # HF Space deployment
    β”œβ”€β”€ app.py                  # Gradio interface
    β”œβ”€β”€ Spacefile               # Deployment config
    └── README.md               # Space documentation

πŸ”§ Configuration

Environment Variables

Create a .env file in the project root:

# Hugging Face Token (required for OCR models)
HF_TOKEN=your_huggingface_token_here

# Database configuration (optional)
DATABASE_URL=sqlite:///legal_documents.db

# Server configuration (optional)
HOST=0.0.0.0
PORT=8000
DEBUG=true

Hugging Face Token

  1. Go to https://huggingface.co/settings/tokens
  2. Create a new token with read permissions
  3. Add it to your environment variables

πŸ§ͺ Testing

Run Structure Test

python test_structure.py

Run API Tests

# Install test dependencies
pip install pytest pytest-asyncio

# Run tests
python -m pytest tests/

Manual Testing

# Test OCR endpoint
curl -X POST "http://localhost:8000/api/ocr/process" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@data/sample_persian.pdf"

# Test dashboard
curl "http://localhost:8000/api/dashboard/summary"

πŸš€ Deployment Options

1. Hugging Face Spaces

Automatic Deployment

  1. Create a new Space on Hugging Face
  2. Upload all files from huggingface_space/ directory
  3. Set the HF_TOKEN environment variable in Space settings
  4. The Space will automatically build and deploy

Manual Deployment

# Navigate to HF Space directory
cd huggingface_space

# Install dependencies
pip install -r ../requirements.txt

# Run the Gradio app
python app.py

2. Docker Deployment

Create Dockerfile

FROM python:3.10-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8000

# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and Run

# Build Docker image
docker build -t legal-dashboard-ocr .

# Run container
docker run -p 8000:8000 \
  -e HF_TOKEN=your_token \
  legal-dashboard-ocr

3. Production Deployment

Using Gunicorn

# Install gunicorn
pip install gunicorn

# Run with multiple workers
gunicorn app.main:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000

Using Nginx (Reverse Proxy)

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

πŸ” Troubleshooting

Common Issues

1. Import Errors

# Ensure you're in the correct directory
cd legal_dashboard_ocr

# Install dependencies
pip install -r requirements.txt

# Check Python path
python -c "import sys; print(sys.path)"

2. OCR Model Loading Issues

# Check HF token
echo $HF_TOKEN

# Test model download
python -c "from transformers import pipeline; p = pipeline('image-to-text', 'microsoft/trocr-base-stage1')"

3. Database Issues

# Check database file
ls -la legal_documents.db

# Reset database (if needed)
rm legal_documents.db

4. Port Already in Use

# Find process using port 8000
lsof -i :8000

# Kill process
kill -9 <PID>

# Or use different port
uvicorn app.main:app --port 8001

Performance Optimization

1. Model Caching

# In app/services/ocr_service.py
# Models are automatically cached by Hugging Face
# Cache location: ~/.cache/huggingface/

2. Database Optimization

-- Add indexes for better performance
CREATE INDEX idx_documents_category ON documents(category);
CREATE INDEX idx_documents_status ON documents(status);
CREATE INDEX idx_documents_created_at ON documents(created_at);

3. Memory Management

# In app/main.py
# Configure memory limits
import gc
gc.collect()  # Force garbage collection

πŸ“Š Monitoring

Health Check

curl http://localhost:8000/health

API Documentation

Logs

# View application logs
tail -f logs/app.log

# View error logs
grep ERROR logs/app.log

πŸ”’ Security

Production Checklist

  • Set DEBUG=false in production
  • Use HTTPS in production
  • Implement rate limiting
  • Add authentication/authorization
  • Secure file upload validation
  • Regular security updates

Environment Security

# Secure environment variables
export HF_TOKEN="your_secure_token"
export DATABASE_URL="your_secure_db_url"

# Use .env file (don't commit to git)
echo "HF_TOKEN=your_token" > .env
echo ".env" >> .gitignore

πŸ“ˆ Scaling

Horizontal Scaling

# Run multiple instances
uvicorn app.main:app --host 0.0.0.0 --port 8000 &
uvicorn app.main:app --host 0.0.0.0 --port 8001 &
uvicorn app.main:app --host 0.0.0.0 --port 8002 &

Load Balancing

upstream legal_dashboard {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
}

server {
    listen 80;
    location / {
        proxy_pass http://legal_dashboard;
    }
}

πŸ†˜ Support

Getting Help

  1. Check the logs for error messages
  2. Verify environment variables are set
  3. Test with the sample PDF in data/
  4. Check the API documentation at /docs

Common Commands

# Start development server
uvicorn app.main:app --reload

# Run tests
python -m pytest tests/

# Check structure
python test_structure.py

# View API docs
open http://localhost:8000/docs

🎯 Next Steps

  1. Deploy to Hugging Face Spaces for easy sharing
  2. Add authentication for production use
  3. Implement user management for multi-user support
  4. Add more OCR models for different document types
  5. Create mobile app for document scanning
  6. Add batch processing for multiple documents
  7. Implement advanced analytics and reporting

Note: This project is designed for Persian legal documents. Ensure your documents are clear and well-scanned for best OCR results.