Spaces:
Paused
Paused
metadata
title: Legal Dashboard OCR System
sdk: docker
emoji: π
colorFrom: indigo
colorTo: yellow
pinned: true
Legal Dashboard OCR System
AI-powered Persian legal document processing system with advanced OCR capabilities using Hugging Face models.
π Features
- Advanced OCR Processing: Hugging Face TrOCR models for Persian text extraction
- AI-Powered Scoring: Intelligent document quality assessment and scoring
- Automatic Categorization: AI-driven document category prediction
- Real-time Dashboard: Live analytics and document management
- WebSocket Support: Real-time updates and notifications
- Comprehensive API: RESTful API for all operations
- Persian Language Support: Optimized for Persian/Farsi legal documents
ποΈ Architecture
legal_dashboard_ocr/
βββ app/ # Backend application
β βββ main.py # FastAPI entry point
β βββ api/ # API route handlers
β β βββ documents.py # Document CRUD operations
β β βββ ocr.py # OCR processing endpoints
β β βββ dashboard.py # Dashboard analytics
β βββ services/ # Business logic services
β β βββ ocr_service.py # OCR pipeline
β β βββ database_service.py # Database operations
β β βββ ai_service.py # AI scoring engine
β βββ models/ # Data models
β βββ document_models.py
βββ frontend/ # Web interface
β βββ improved_legal_dashboard.html
β βββ test_integration.html
βββ tests/ # Test suite
β βββ test_api_endpoints.py
β βββ test_ocr_pipeline.py
βββ data/ # Sample documents
β βββ sample_persian.pdf
βββ huggingface_space/ # HF Space deployment
β βββ app.py # Gradio interface
β βββ Spacefile # Deployment config
β βββ README.md # Space documentation
βββ requirements.txt # Dependencies
π οΈ Installation
Prerequisites
- Python 3.10+
- pip
- Git
Setup
Clone the repository
git clone <repository-url> cd legal_dashboard_ocr
Install dependencies
pip install -r requirements.txt
Set up environment variables
# Create .env file echo "HF_TOKEN=your_huggingface_token" > .env
Run the application
# Start the FastAPI server uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Access the application
- Web Dashboard: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
π Usage
Web Interface
- Upload PDF: Navigate to the dashboard and upload a Persian legal document
- Process Document: Click "Process PDF" to extract text using OCR
- Review Results: View extracted text, AI analysis, and quality metrics
- Save Document: Optionally save processed documents to the database
- View Analytics: Check dashboard statistics and trends
API Usage
Process PDF with OCR
curl -X POST "http://localhost:8000/api/ocr/process" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]"
Get Documents
curl "http://localhost:8000/api/documents?limit=10&offset=0"
Create Document
curl -X POST "http://localhost:8000/api/documents/" \
-H "Content-Type: application/json" \
-d '{
"title": "Legal Document",
"full_text": "Extracted text content",
"source": "Uploaded",
"category": "ΩΨ§ΩΩΩ"
}'
Get Dashboard Summary
curl "http://localhost:8000/api/dashboard/summary"
π§ Configuration
OCR Models
The system supports multiple Hugging Face OCR models:
microsoft/trocr-base-stage1
: Default model for printed textmicrosoft/trocr-base-handwritten
: For handwritten textmicrosoft/trocr-large-stage1
: Higher accuracy model
AI Scoring Weights
The AI scoring engine uses configurable weights:
- Keyword Relevance: 30%
- Document Completeness: 25%
- Recency: 20%
- Source Credibility: 15%
- Document Quality: 10%
Database
SQLite database with tables for:
- Documents
- AI training data
- System metrics
π§ͺ Testing
Run Tests
# Run all tests
python -m pytest tests/
# Run specific test
python -m pytest tests/test_api_endpoints.py
# Run with coverage
python -m pytest tests/ --cov=app
Test Coverage
- API endpoint testing
- OCR pipeline validation
- Database operations
- AI scoring accuracy
- Frontend integration
π Deployment
Hugging Face Spaces
- Create a new Space on Hugging Face
- Upload the project files
- Set environment variables:
HF_TOKEN
: Your Hugging Face token
- Deploy the Space
Docker Deployment
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Production Deployment
- Set up a production server
- Install dependencies
- Configure environment variables
- Set up reverse proxy (nginx)
- Run with gunicorn:
gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker
π API Documentation
Endpoints
Documents
GET /api/documents/
- List documentsPOST /api/documents/
- Create documentGET /api/documents/{id}
- Get documentPUT /api/documents/{id}
- Update documentDELETE /api/documents/{id}
- Delete document
OCR
POST /api/ocr/process
- Process PDFPOST /api/ocr/process-and-save
- Process and savePOST /api/ocr/batch-process
- Batch processingGET /api/ocr/status
- OCR status
Dashboard
GET /api/dashboard/summary
- Dashboard summaryGET /api/dashboard/charts-data
- Chart dataGET /api/dashboard/ai-suggestions
- AI suggestionsPOST /api/dashboard/ai-feedback
- Submit feedback
Response Formats
All API responses follow standard JSON format with:
- Success/error status
- Data payload
- Metadata (timestamps, pagination, etc.)
π Security
Authentication
- API key authentication for production
- Rate limiting on endpoints
- Input validation and sanitization
Data Protection
- Secure file upload handling
- Temporary file cleanup
- Database connection security
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
Development Guidelines
- Follow PEP 8 style guide
- Add type hints to functions
- Write comprehensive docstrings
- Include unit tests
- Update documentation
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Hugging Face for OCR models
- FastAPI for the web framework
- Gradio for the Space interface
- Microsoft for TrOCR models
π Support
For support and questions:
- Create an issue on GitHub
- Check the documentation
- Review the API docs at
/docs
π Changelog
v1.0.0
- Initial release
- OCR pipeline with Hugging Face models
- AI scoring engine
- Dashboard interface
- RESTful API
- Hugging Face Space deployment