---
title: Legal Dashboard OCR System
sdk: docker
emoji: 🚀
colorFrom: indigo
colorTo: yellow
pinned: true
---

# Legal Dashboard OCR System

AI-powered Persian legal document processing system with advanced OCR capabilities using Hugging Face models.

## 🚀 Features

- **Advanced OCR Processing**: Hugging Face TrOCR models for Persian text extraction
- **AI-Powered Scoring**: Intelligent document quality assessment and scoring
- **Automatic Categorization**: AI-driven document category prediction
- **Real-time Dashboard**: Live analytics and document management
- **WebSocket Support**: Real-time updates and notifications
- **Comprehensive API**: RESTful API for all operations
- **Persian Language Support**: Optimized for Persian/Farsi legal documents

## 🏗️ Architecture

```
legal_dashboard_ocr/
├── app/                     # Backend application
│   ├── main.py             # FastAPI entry point
│   ├── api/                # API route handlers
│   │   ├── documents.py    # Document CRUD operations
│   │   ├── ocr.py         # OCR processing endpoints
│   │   └── dashboard.py   # Dashboard analytics
│   ├── services/           # Business logic services
│   │   ├── ocr_service.py # OCR pipeline
│   │   ├── database_service.py # Database operations
│   │   └── ai_service.py  # AI scoring engine
│   └── models/             # Data models
│       └── document_models.py
├── frontend/               # Web interface
│   ├── improved_legal_dashboard.html
│   └── test_integration.html
├── tests/                  # Test suite
│   ├── test_api_endpoints.py
│   └── test_ocr_pipeline.py
├── data/                   # Sample documents
│   └── sample_persian.pdf
├── huggingface_space/      # HF Space deployment
│   ├── app.py             # Gradio interface
│   ├── Spacefile          # Deployment config
│   └── README.md          # Space documentation
└── requirements.txt        # Dependencies
```

## 🛠️ Installation

### Prerequisites

- Python 3.10+
- pip
- Git

### Setup

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd legal_dashboard_ocr
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables**
   ```bash
   # Create .env file
   echo "HF_TOKEN=your_huggingface_token" > .env
   ```

4. **Run the application**
   ```bash
   # Start the FastAPI server
   uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
   ```

5. **Access the application**
   - Web Dashboard: http://localhost:8000
   - API Documentation: http://localhost:8000/docs
   - Health Check: http://localhost:8000/health

## 📖 Usage

### Web Interface

1. **Upload PDF**: Navigate to the dashboard and upload a Persian legal document
2. **Process Document**: Click "Process PDF" to extract text using OCR
3. **Review Results**: View extracted text, AI analysis, and quality metrics
4. **Save Document**: Optionally save processed documents to the database
5. **View Analytics**: Check dashboard statistics and trends

### API Usage

#### Process PDF with OCR
```bash
curl -X POST "http://localhost:8000/api/ocr/process" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.pdf"
```

#### Get Documents
```bash
curl "http://localhost:8000/api/documents?limit=10&offset=0"
```

#### Create Document
```bash
curl -X POST "http://localhost:8000/api/documents/" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Legal Document",
    "full_text": "Extracted text content",
    "source": "Uploaded",
    "category": "قانون"
  }'
```

#### Get Dashboard Summary
```bash
curl "http://localhost:8000/api/dashboard/summary"
```

## 🔧 Configuration

### OCR Models

The system supports multiple Hugging Face OCR models:

- `microsoft/trocr-base-stage1`: Default model for printed text
- `microsoft/trocr-base-handwritten`: For handwritten text
- `microsoft/trocr-large-stage1`: Higher accuracy model

### AI Scoring Weights

The AI scoring engine uses configurable weights:

- Keyword Relevance: 30%
- Document Completeness: 25%
- Recency: 20%
- Source Credibility: 15%
- Document Quality: 10%

### Database

SQLite database with tables for:
- Documents
- AI training data
- System metrics

## 🧪 Testing

### Run Tests
```bash
# Run all tests
python -m pytest tests/

# Run specific test
python -m pytest tests/test_api_endpoints.py

# Run with coverage
python -m pytest tests/ --cov=app
```

### Test Coverage
- API endpoint testing
- OCR pipeline validation
- Database operations
- AI scoring accuracy
- Frontend integration

## 🚀 Deployment

### Hugging Face Spaces

1. **Create a new Space** on Hugging Face
2. **Upload the project** files
3. **Set environment variables**:
   - `HF_TOKEN`: Your Hugging Face token
4. **Deploy** the Space

### Docker Deployment

```dockerfile
FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

### Production Deployment

1. **Set up a production server**
2. **Install dependencies**
3. **Configure environment variables**
4. **Set up reverse proxy** (nginx)
5. **Run with gunicorn**:
   ```bash
   gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker
   ```

## 📊 API Documentation

### Endpoints

#### Documents
- `GET /api/documents/` - List documents
- `POST /api/documents/` - Create document
- `GET /api/documents/{id}` - Get document
- `PUT /api/documents/{id}` - Update document
- `DELETE /api/documents/{id}` - Delete document

#### OCR
- `POST /api/ocr/process` - Process PDF
- `POST /api/ocr/process-and-save` - Process and save
- `POST /api/ocr/batch-process` - Batch processing
- `GET /api/ocr/status` - OCR status

#### Dashboard
- `GET /api/dashboard/summary` - Dashboard summary
- `GET /api/dashboard/charts-data` - Chart data
- `GET /api/dashboard/ai-suggestions` - AI suggestions
- `POST /api/dashboard/ai-feedback` - Submit feedback

### Response Formats

All API responses follow standard JSON format with:
- Success/error status
- Data payload
- Metadata (timestamps, pagination, etc.)

## 🔒 Security

### Authentication
- API key authentication for production
- Rate limiting on endpoints
- Input validation and sanitization

### Data Protection
- Secure file upload handling
- Temporary file cleanup
- Database connection security

## 🤝 Contributing

1. **Fork the repository**
2. **Create a feature branch**
3. **Make your changes**
4. **Add tests** for new functionality
5. **Submit a pull request**

### Development Guidelines

- Follow PEP 8 style guide
- Add type hints to functions
- Write comprehensive docstrings
- Include unit tests
- Update documentation

## 📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- Hugging Face for OCR models
- FastAPI for the web framework
- Gradio for the Space interface
- Microsoft for TrOCR models

## 📞 Support

For support and questions:
- Create an issue on GitHub
- Check the documentation
- Review the API docs at `/docs`

## 🔄 Changelog

### v1.0.0
- Initial release
- OCR pipeline with Hugging Face models
- AI scoring engine
- Dashboard interface
- RESTful API
- Hugging Face Space deployment