Spaces:

Sk4467
/

odia_ocr_annotation_synthetic_app

Sleeping

App Files Files Community

odia_ocr_annotation_synthetic_app / README.md

Shahid

changed app

1ac44a6 2 months ago

preview code

raw

history blame contribute delete

6.85 kB

metadata

title: Odia OCR Annotation + Synthetic Generator
emoji: 🧩
colorFrom: indigo
colorTo: yellow
sdk: docker
sdk_version: 1.0.0
pinned: false

Odia OCR Annotation + Synthetic Text Generator

A unified repository that provides:

An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.

Repository Structure

backend/
- app/main.py: FastAPI app with two routers: /api/ocr and /api/synthetic
- app/api/routers/ocr.py: OCR endpoints (upload, OCR, annotations import/export)
- app/api/routers/synthetic.py: Synthetic generation endpoints
- app/services/: Shared services
  - ocr_processor.py: Gemini OCR
  - annotations.py: CSV/JSON I/O
  - synthetic/: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
- data/: runtime storage
  - uploaded_images/: uploaded images (served at /images)
  - annotations/: annotations.csv and JSON
  - synth_outputs/: generated images and CSVs (served at /static/synthetic)
- requirements.txt: backend dependencies
frontend/
- Vite + React + Tailwind app
- Routes: /ocr (annotation UI) and /synthetic (generator UI)
content/static/: NotoSans Oriya fonts used by generator

Run Locally

Backend

pip install -r backend/requirements.txt
From backend/: uvicorn app.main:app --reload
Static mounts:
- /images → backend/data/uploaded_images
- /static/synthetic → backend/data/synth_outputs

Frontend

cd frontend && npm install && npm run dev
Open http://localhost:5173
Use navigation to switch between OCR and Synthetic pages

OCR API (FastAPI)

POST /api/ocr/upload:
- Multipart files field: files
- Stores images in backend/data/uploaded_images
POST /api/ocr/process:
- JSON: { "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }
- Returns: { "img1.png": "extracted text", ... }
GET /api/ocr/annotations:
- Returns current annotations, valid/missing images
POST /api/ocr/save:
- JSON: { "<filename>": { "extracted_text": "...", "validated_text": "..." } }
- Saves to CSV and JSON in backend/data/annotations
POST /api/ocr/import:
- Multipart: file (CSV), image_folder (e.g., uploaded_images)
- Validates and returns annotations + image presence
POST /api/ocr/export:
- JSON: { annotations: {...}, validated_texts: {...} }
- Returns a downloadable CSV

Note: Legacy endpoints (/upload/, /process-ocr/, etc.) are temporarily supported for the older UI. Prefer /api/ocr/... going forward.

Synthetic API (FastAPI)

POST /api/synthetic/generate
- Modes: single, comprehensive, ultra-realistic, huggingface
- Request body examples:
  - Non-HF: { "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }
  - HF CSV: { "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }
- Response:
  - Non-HF: { "status": "ok", "output_dir": "/static/synthetic/<job_id>" }
  - HF: { "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }
- Outputs are stored under backend/data/synth_outputs/<job_id>/ and publicly served at /static/synthetic/<job_id>/....

Fonts

Generator uses fonts from content/static/.
Default: NotoSansOriya_Condensed-Regular.ttf (configurable). Ensure the directory exists.

Effects & Styles

Paper styles: lined paper, old paper, birch, parchment
Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps

Notes

The backend expects the Gemini API key to be provided per-request to /api/ocr/process. Do not hardcode keys server-side.
For HuggingFace datasets, the backend uses datasets when possible, or downloads raw CSV URLs.
You can browse generated outputs via the links returned by /api/synthetic/generate.

Deploy to Hugging Face Spaces (Docker)

This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.

Steps:

Create a new Space → Type: Docker
Push this repository to the Space
In Space Settings:
- Enable Persistent Storage
- (Optional) Add Secrets/Env Vars as needed, e.g., DATA_DIR=/data (default already) and FRONTEND_DIST=/app/frontend_dist
The container exposes port 7860 by default.

What the image does:

Builds the frontend (frontend/) and copies the dist/ to /app/frontend_dist
Installs backend dependencies and runs uvicorn app.main:app from backend/
Serves:
- API at /api/...
- Uploaded images at /images
- Synthetic outputs at /static/synthetic
- Frontend SPA at / (served from /app/frontend_dist)

Paper Textures: Realistic paper fiber patterns using Perlin noise
Aging Effects: Edge darkening and aging patterns
Physical Damage: Fold lines, creases, and ink bleeding
Scanner Artifacts: Dust, compression artifacts, scanning lines
Geometric Distortions: Perspective changes, cylindrical warping
Lighting Effects: Shadows and lens distortions

Font Requirements

The generator requires appropriate fonts for text rendering. Default configuration expects:

Font directory: /content/static/
Font file: NotoSansOriya_ExtraCondensed-Regular.ttf

You can specify custom fonts using --font-dir and --font parameters.

Performance Tips

Use --max-samples to limit processing for large datasets
Disable advanced effects with --no-advanced-effects for faster generation
Use multiprocessing with --use-multiprocessing for batch jobs
Adjust image dimensions to balance quality and speed

Error Handling

The package includes comprehensive error handling:

Graceful fallbacks for missing dependencies
Detailed logging for debugging
Validation of input parameters
Safe handling of malformed datasets

Contributing

The modular structure makes it easy to extend:

Add new effects in effects.py
Implement new background styles in backgrounds.py
Create custom transformations in transformations.py
Extend dataset processing in huggingface_processor.py

License

[Add your license information here]

Note: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.