Spaces:
Paused
Paused
A newer version of the Gradio SDK is available:
5.45.0
Legal Dashboard OCR - Deployment Instructions
π Quick Start
1. Local Development Setup
# Clone or navigate to the project
cd legal_dashboard_ocr
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export HF_TOKEN="your_huggingface_token"
# Run the application
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
2. Access the Application
- Web Dashboard: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
π¦ Project Structure
legal_dashboard_ocr/
βββ README.md # Main documentation
βββ requirements.txt # Python dependencies
βββ test_structure.py # Structure verification
βββ DEPLOYMENT_INSTRUCTIONS.md # This file
βββ app/ # Backend application
β βββ __init__.py
β βββ main.py # FastAPI entry point
β βββ api/ # API routes
β β βββ __init__.py
β β βββ documents.py # Document CRUD
β β βββ ocr.py # OCR processing
β β βββ dashboard.py # Dashboard analytics
β βββ services/ # Business logic
β β βββ __init__.py
β β βββ ocr_service.py # OCR pipeline
β β βββ database_service.py # Database operations
β β βββ ai_service.py # AI scoring
β βββ models/ # Data models
β βββ __init__.py
β βββ document_models.py # Pydantic schemas
βββ frontend/ # Web interface
β βββ improved_legal_dashboard.html
β βββ test_integration.html
βββ tests/ # Test suite
β βββ test_api_endpoints.py
β βββ test_ocr_pipeline.py
βββ data/ # Sample documents
β βββ sample_persian.pdf
βββ huggingface_space/ # HF Space deployment
βββ app.py # Gradio interface
βββ Spacefile # Deployment config
βββ README.md # Space documentation
π§ Configuration
Environment Variables
Create a .env
file in the project root:
# Hugging Face Token (required for OCR models)
HF_TOKEN=your_huggingface_token_here
# Database configuration (optional)
DATABASE_URL=sqlite:///legal_documents.db
# Server configuration (optional)
HOST=0.0.0.0
PORT=8000
DEBUG=true
Hugging Face Token
- Go to https://huggingface.co/settings/tokens
- Create a new token with read permissions
- Add it to your environment variables
π§ͺ Testing
Run Structure Test
python test_structure.py
Run API Tests
# Install test dependencies
pip install pytest pytest-asyncio
# Run tests
python -m pytest tests/
Manual Testing
# Test OCR endpoint
curl -X POST "http://localhost:8000/api/ocr/process" \
-H "Content-Type: multipart/form-data" \
-F "file=@data/sample_persian.pdf"
# Test dashboard
curl "http://localhost:8000/api/dashboard/summary"
π Deployment Options
1. Hugging Face Spaces
Automatic Deployment
- Create a new Space on Hugging Face
- Upload all files from
huggingface_space/
directory - Set the
HF_TOKEN
environment variable in Space settings - The Space will automatically build and deploy
Manual Deployment
# Navigate to HF Space directory
cd huggingface_space
# Install dependencies
pip install -r ../requirements.txt
# Run the Gradio app
python app.py
2. Docker Deployment
Create Dockerfile
FROM python:3.10-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 8000
# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Build and Run
# Build Docker image
docker build -t legal-dashboard-ocr .
# Run container
docker run -p 8000:8000 \
-e HF_TOKEN=your_token \
legal-dashboard-ocr
3. Production Deployment
Using Gunicorn
# Install gunicorn
pip install gunicorn
# Run with multiple workers
gunicorn app.main:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000
Using Nginx (Reverse Proxy)
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
π Troubleshooting
Common Issues
1. Import Errors
# Ensure you're in the correct directory
cd legal_dashboard_ocr
# Install dependencies
pip install -r requirements.txt
# Check Python path
python -c "import sys; print(sys.path)"
2. OCR Model Loading Issues
# Check HF token
echo $HF_TOKEN
# Test model download
python -c "from transformers import pipeline; p = pipeline('image-to-text', 'microsoft/trocr-base-stage1')"
3. Database Issues
# Check database file
ls -la legal_documents.db
# Reset database (if needed)
rm legal_documents.db
4. Port Already in Use
# Find process using port 8000
lsof -i :8000
# Kill process
kill -9 <PID>
# Or use different port
uvicorn app.main:app --port 8001
Performance Optimization
1. Model Caching
# In app/services/ocr_service.py
# Models are automatically cached by Hugging Face
# Cache location: ~/.cache/huggingface/
2. Database Optimization
-- Add indexes for better performance
CREATE INDEX idx_documents_category ON documents(category);
CREATE INDEX idx_documents_status ON documents(status);
CREATE INDEX idx_documents_created_at ON documents(created_at);
3. Memory Management
# In app/main.py
# Configure memory limits
import gc
gc.collect() # Force garbage collection
π Monitoring
Health Check
curl http://localhost:8000/health
API Documentation
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Logs
# View application logs
tail -f logs/app.log
# View error logs
grep ERROR logs/app.log
π Security
Production Checklist
- Set
DEBUG=false
in production - Use HTTPS in production
- Implement rate limiting
- Add authentication/authorization
- Secure file upload validation
- Regular security updates
Environment Security
# Secure environment variables
export HF_TOKEN="your_secure_token"
export DATABASE_URL="your_secure_db_url"
# Use .env file (don't commit to git)
echo "HF_TOKEN=your_token" > .env
echo ".env" >> .gitignore
π Scaling
Horizontal Scaling
# Run multiple instances
uvicorn app.main:app --host 0.0.0.0 --port 8000 &
uvicorn app.main:app --host 0.0.0.0 --port 8001 &
uvicorn app.main:app --host 0.0.0.0 --port 8002 &
Load Balancing
upstream legal_dashboard {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
server 127.0.0.1:8002;
}
server {
listen 80;
location / {
proxy_pass http://legal_dashboard;
}
}
π Support
Getting Help
- Check the logs for error messages
- Verify environment variables are set
- Test with the sample PDF in
data/
- Check the API documentation at
/docs
Common Commands
# Start development server
uvicorn app.main:app --reload
# Run tests
python -m pytest tests/
# Check structure
python test_structure.py
# View API docs
open http://localhost:8000/docs
π― Next Steps
- Deploy to Hugging Face Spaces for easy sharing
- Add authentication for production use
- Implement user management for multi-user support
- Add more OCR models for different document types
- Create mobile app for document scanning
- Add batch processing for multiple documents
- Implement advanced analytics and reporting
Note: This project is designed for Persian legal documents. Ensure your documents are clear and well-scanned for best OCR results.