Shahid
changed app
1ac44a6
metadata
title: Odia OCR Annotation + Synthetic Generator
emoji: 🧩
colorFrom: indigo
colorTo: yellow
sdk: docker
sdk_version: 1.0.0
pinned: false

Odia OCR Annotation + Synthetic Text Generator

A unified repository that provides:

  • An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
  • A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.

Repository Structure

  • backend/
    • app/main.py: FastAPI app with two routers: /api/ocr and /api/synthetic
    • app/api/routers/ocr.py: OCR endpoints (upload, OCR, annotations import/export)
    • app/api/routers/synthetic.py: Synthetic generation endpoints
    • app/services/: Shared services
      • ocr_processor.py: Gemini OCR
      • annotations.py: CSV/JSON I/O
      • synthetic/: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
    • data/: runtime storage
      • uploaded_images/: uploaded images (served at /images)
      • annotations/: annotations.csv and JSON
      • synth_outputs/: generated images and CSVs (served at /static/synthetic)
    • requirements.txt: backend dependencies
  • frontend/
    • Vite + React + Tailwind app
    • Routes: /ocr (annotation UI) and /synthetic (generator UI)
  • content/static/: NotoSans Oriya fonts used by generator

Run Locally

  1. Backend
  • pip install -r backend/requirements.txt
  • From backend/: uvicorn app.main:app --reload
  • Static mounts:
    • /imagesbackend/data/uploaded_images
    • /static/syntheticbackend/data/synth_outputs
  1. Frontend
  • cd frontend && npm install && npm run dev
  • Open http://localhost:5173
  • Use navigation to switch between OCR and Synthetic pages

OCR API (FastAPI)

  • POST /api/ocr/upload:
    • Multipart files field: files
    • Stores images in backend/data/uploaded_images
  • POST /api/ocr/process:
    • JSON: { "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }
    • Returns: { "img1.png": "extracted text", ... }
  • GET /api/ocr/annotations:
    • Returns current annotations, valid/missing images
  • POST /api/ocr/save:
    • JSON: { "<filename>": { "extracted_text": "...", "validated_text": "..." } }
    • Saves to CSV and JSON in backend/data/annotations
  • POST /api/ocr/import:
    • Multipart: file (CSV), image_folder (e.g., uploaded_images)
    • Validates and returns annotations + image presence
  • POST /api/ocr/export:
    • JSON: { annotations: {...}, validated_texts: {...} }
    • Returns a downloadable CSV

Note: Legacy endpoints (/upload/, /process-ocr/, etc.) are temporarily supported for the older UI. Prefer /api/ocr/... going forward.

Synthetic API (FastAPI)

  • POST /api/synthetic/generate
    • Modes: single, comprehensive, ultra-realistic, huggingface
    • Request body examples:
      • Non-HF: { "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }
      • HF CSV: { "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }
    • Response:
      • Non-HF: { "status": "ok", "output_dir": "/static/synthetic/<job_id>" }
      • HF: { "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }
    • Outputs are stored under backend/data/synth_outputs/<job_id>/ and publicly served at /static/synthetic/<job_id>/....

Fonts

  • Generator uses fonts from content/static/.
  • Default: NotoSansOriya_Condensed-Regular.ttf (configurable). Ensure the directory exists.

Effects & Styles

  • Paper styles: lined paper, old paper, birch, parchment
  • Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps

Notes

  • The backend expects the Gemini API key to be provided per-request to /api/ocr/process. Do not hardcode keys server-side.
  • For HuggingFace datasets, the backend uses datasets when possible, or downloads raw CSV URLs.
  • You can browse generated outputs via the links returned by /api/synthetic/generate.

Deploy to Hugging Face Spaces (Docker)

This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.

Steps:

  • Create a new Space → Type: Docker
  • Push this repository to the Space
  • In Space Settings:
    • Enable Persistent Storage
    • (Optional) Add Secrets/Env Vars as needed, e.g., DATA_DIR=/data (default already) and FRONTEND_DIST=/app/frontend_dist
  • The container exposes port 7860 by default.

What the image does:

  • Builds the frontend (frontend/) and copies the dist/ to /app/frontend_dist
  • Installs backend dependencies and runs uvicorn app.main:app from backend/
  • Serves:
    • API at /api/...
    • Uploaded images at /images
    • Synthetic outputs at /static/synthetic
    • Frontend SPA at / (served from /app/frontend_dist)
  1. Paper Textures: Realistic paper fiber patterns using Perlin noise
  2. Aging Effects: Edge darkening and aging patterns
  3. Physical Damage: Fold lines, creases, and ink bleeding
  4. Scanner Artifacts: Dust, compression artifacts, scanning lines
  5. Geometric Distortions: Perspective changes, cylindrical warping
  6. Lighting Effects: Shadows and lens distortions

Font Requirements

The generator requires appropriate fonts for text rendering. Default configuration expects:

  • Font directory: /content/static/
  • Font file: NotoSansOriya_ExtraCondensed-Regular.ttf

You can specify custom fonts using --font-dir and --font parameters.

Performance Tips

  • Use --max-samples to limit processing for large datasets
  • Disable advanced effects with --no-advanced-effects for faster generation
  • Use multiprocessing with --use-multiprocessing for batch jobs
  • Adjust image dimensions to balance quality and speed

Error Handling

The package includes comprehensive error handling:

  • Graceful fallbacks for missing dependencies
  • Detailed logging for debugging
  • Validation of input parameters
  • Safe handling of malformed datasets

Contributing

The modular structure makes it easy to extend:

  • Add new effects in effects.py
  • Implement new background styles in backgrounds.py
  • Create custom transformations in transformations.py
  • Extend dataset processing in huggingface_processor.py

License

[Add your license information here]


Note: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.