Spaces:

Really-amin
/

Hoghoghi

Paused

File size: 8,945 Bytes

922c3ba

# Legal Dashboard OCR - Deployment Instructions

## 🚀 Quick Start

### 1. Local Development Setup

```bash

# Clone or navigate to the project

cd legal_dashboard_ocr



# Install dependencies

pip install -r requirements.txt



# Set environment variables

export HF_TOKEN="your_huggingface_token"



# Run the application

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

```

### 2. Access the Application

- **Web Dashboard**: http://localhost:8000
- **API Documentation**: http://localhost:8000/docs
- **Health Check**: http://localhost:8000/health

## 📦 Project Structure

```

legal_dashboard_ocr/

├── README.md                    # Main documentation

├── requirements.txt             # Python dependencies

├── test_structure.py           # Structure verification

├── DEPLOYMENT_INSTRUCTIONS.md  # This file

├── app/                        # Backend application

│   ├── __init__.py

│   ├── main.py                 # FastAPI entry point

│   ├── api/                    # API routes

│   │   ├── __init__.py

│   │   ├── documents.py        # Document CRUD

│   │   ├── ocr.py             # OCR processing

│   │   └── dashboard.py       # Dashboard analytics

│   ├── services/               # Business logic

│   │   ├── __init__.py

│   │   ├── ocr_service.py     # OCR pipeline

│   │   ├── database_service.py # Database operations

│   │   └── ai_service.py      # AI scoring

│   └── models/                 # Data models

│       ├── __init__.py

│       └── document_models.py  # Pydantic schemas

├── frontend/                   # Web interface

│   ├── improved_legal_dashboard.html

│   └── test_integration.html

├── tests/                      # Test suite

│   ├── test_api_endpoints.py

│   └── test_ocr_pipeline.py

├── data/                       # Sample documents

│   └── sample_persian.pdf

└── huggingface_space/          # HF Space deployment

    ├── app.py                  # Gradio interface

    ├── Spacefile               # Deployment config

    └── README.md               # Space documentation

```

## 🔧 Configuration

### Environment Variables

Create a `.env` file in the project root:

```env

# Hugging Face Token (required for OCR models)

HF_TOKEN=your_huggingface_token_here



# Database configuration (optional)

DATABASE_URL=sqlite:///legal_documents.db



# Server configuration (optional)

HOST=0.0.0.0

PORT=8000

DEBUG=true

```

### Hugging Face Token

1. Go to https://huggingface.co/settings/tokens
2. Create a new token with read permissions
3. Add it to your environment variables

## 🧪 Testing

### Run Structure Test
```bash

python test_structure.py

```

### Run API Tests
```bash

# Install test dependencies

pip install pytest pytest-asyncio



# Run tests

python -m pytest tests/

```

### Manual Testing
```bash

# Test OCR endpoint

curl -X POST "http://localhost:8000/api/ocr/process" \

  -H "Content-Type: multipart/form-data" \

  -F "file=@data/sample_persian.pdf"



# Test dashboard

curl "http://localhost:8000/api/dashboard/summary"

```

## 🚀 Deployment Options

### 1. Hugging Face Spaces

#### Automatic Deployment
1. Create a new Space on Hugging Face
2. Upload all files from `huggingface_space/` directory
3. Set the `HF_TOKEN` environment variable in Space settings
4. The Space will automatically build and deploy

#### Manual Deployment
```bash

# Navigate to HF Space directory

cd huggingface_space



# Install dependencies

pip install -r ../requirements.txt



# Run the Gradio app

python app.py

```

### 2. Docker Deployment

#### Create Dockerfile
```dockerfile

FROM python:3.10-slim



WORKDIR /app



# Install system dependencies

RUN apt-get update && apt-get install -y \

    build-essential \

    && rm -rf /var/lib/apt/lists/*



# Copy requirements and install Python dependencies

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt



# Copy application code

COPY . .



# Expose port

EXPOSE 8000



# Run the application

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

```

#### Build and Run
```bash

# Build Docker image

docker build -t legal-dashboard-ocr .



# Run container

docker run -p 8000:8000 \

  -e HF_TOKEN=your_token \

  legal-dashboard-ocr

```

### 3. Production Deployment

#### Using Gunicorn
```bash

# Install gunicorn

pip install gunicorn



# Run with multiple workers

gunicorn app.main:app \

  --workers 4 \

  --worker-class uvicorn.workers.UvicornWorker \

  --bind 0.0.0.0:8000

```

#### Using Nginx (Reverse Proxy)
```nginx

server {

    listen 80;

    server_name your-domain.com;



    location / {

        proxy_pass http://127.0.0.1:8000;

        proxy_set_header Host $host;

        proxy_set_header X-Real-IP $remote_addr;

        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        proxy_set_header X-Forwarded-Proto $scheme;

    }

}

```

## 🔍 Troubleshooting

### Common Issues

#### 1. Import Errors
```bash

# Ensure you're in the correct directory

cd legal_dashboard_ocr



# Install dependencies

pip install -r requirements.txt



# Check Python path

python -c "import sys; print(sys.path)"

```

#### 2. OCR Model Loading Issues
```bash

# Check HF token

echo $HF_TOKEN



# Test model download

python -c "from transformers import pipeline; p = pipeline('image-to-text', 'microsoft/trocr-base-stage1')"

```

#### 3. Database Issues
```bash

# Check database file

ls -la legal_documents.db



# Reset database (if needed)

rm legal_documents.db

```

#### 4. Port Already in Use
```bash

# Find process using port 8000

lsof -i :8000



# Kill process

kill -9 <PID>



# Or use different port

uvicorn app.main:app --port 8001

```

### Performance Optimization

#### 1. Model Caching
```python

# In app/services/ocr_service.py

# Models are automatically cached by Hugging Face

# Cache location: ~/.cache/huggingface/

```

#### 2. Database Optimization
```sql

-- Add indexes for better performance

CREATE INDEX idx_documents_category ON documents(category);

CREATE INDEX idx_documents_status ON documents(status);

CREATE INDEX idx_documents_created_at ON documents(created_at);

```

#### 3. Memory Management
```python

# In app/main.py

# Configure memory limits

import gc

gc.collect()  # Force garbage collection

```

## 📊 Monitoring

### Health Check
```bash

curl http://localhost:8000/health

```

### API Documentation
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc

### Logs
```bash

# View application logs

tail -f logs/app.log



# View error logs

grep ERROR logs/app.log

```

## 🔒 Security

### Production Checklist
- [ ] Set `DEBUG=false` in production
- [ ] Use HTTPS in production
- [ ] Implement rate limiting
- [ ] Add authentication/authorization
- [ ] Secure file upload validation
- [ ] Regular security updates

### Environment Security
```bash

# Secure environment variables

export HF_TOKEN="your_secure_token"

export DATABASE_URL="your_secure_db_url"



# Use .env file (don't commit to git)

echo "HF_TOKEN=your_token" > .env

echo ".env" >> .gitignore

```

## 📈 Scaling

### Horizontal Scaling
```bash

# Run multiple instances

uvicorn app.main:app --host 0.0.0.0 --port 8000 &

uvicorn app.main:app --host 0.0.0.0 --port 8001 &

uvicorn app.main:app --host 0.0.0.0 --port 8002 &

```

### Load Balancing
```nginx

upstream legal_dashboard {

    server 127.0.0.1:8000;

    server 127.0.0.1:8001;

    server 127.0.0.1:8002;

}



server {

    listen 80;

    location / {

        proxy_pass http://legal_dashboard;

    }

}

```

## 🆘 Support

### Getting Help
1. Check the logs for error messages
2. Verify environment variables are set
3. Test with the sample PDF in `data/`
4. Check the API documentation at `/docs`

### Common Commands
```bash

# Start development server

uvicorn app.main:app --reload



# Run tests

python -m pytest tests/



# Check structure

python test_structure.py



# View API docs

open http://localhost:8000/docs

```

## 🎯 Next Steps

1. **Deploy to Hugging Face Spaces** for easy sharing
2. **Add authentication** for production use
3. **Implement user management** for multi-user support
4. **Add more OCR models** for different document types
5. **Create mobile app** for document scanning
6. **Add batch processing** for multiple documents
7. **Implement advanced analytics** and reporting

---

**Note**: This project is designed for Persian legal documents. Ensure your documents are clear and well-scanned for best OCR results.