metadata
title: Odia OCR Annotation + Synthetic Generator
emoji: 🧩
colorFrom: indigo
colorTo: yellow
sdk: docker
sdk_version: 1.0.0
pinned: false
Odia OCR Annotation + Synthetic Text Generator
A unified repository that provides:
- An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
- A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.
Repository Structure
backend/app/main.py: FastAPI app with two routers:/api/ocrand/api/syntheticapp/api/routers/ocr.py: OCR endpoints (upload, OCR, annotations import/export)app/api/routers/synthetic.py: Synthetic generation endpointsapp/services/: Shared servicesocr_processor.py: Gemini OCRannotations.py: CSV/JSON I/Osynthetic/: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
data/: runtime storageuploaded_images/: uploaded images (served at/images)annotations/:annotations.csvand JSONsynth_outputs/: generated images and CSVs (served at/static/synthetic)
requirements.txt: backend dependencies
frontend/- Vite + React + Tailwind app
- Routes:
/ocr(annotation UI) and/synthetic(generator UI)
content/static/: NotoSans Oriya fonts used by generator
Run Locally
- Backend
pip install -r backend/requirements.txt- From
backend/:uvicorn app.main:app --reload - Static mounts:
/images→backend/data/uploaded_images/static/synthetic→backend/data/synth_outputs
- Frontend
cd frontend && npm install && npm run dev- Open
http://localhost:5173 - Use navigation to switch between OCR and Synthetic pages
OCR API (FastAPI)
POST /api/ocr/upload:- Multipart files field:
files - Stores images in
backend/data/uploaded_images
- Multipart files field:
POST /api/ocr/process:- JSON:
{ "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] } - Returns:
{ "img1.png": "extracted text", ... }
- JSON:
GET /api/ocr/annotations:- Returns current annotations, valid/missing images
POST /api/ocr/save:- JSON:
{ "<filename>": { "extracted_text": "...", "validated_text": "..." } } - Saves to CSV and JSON in
backend/data/annotations
- JSON:
POST /api/ocr/import:- Multipart:
file(CSV),image_folder(e.g.,uploaded_images) - Validates and returns annotations + image presence
- Multipart:
POST /api/ocr/export:- JSON:
{ annotations: {...}, validated_texts: {...} } - Returns a downloadable CSV
- JSON:
Note: Legacy endpoints (/upload/, /process-ocr/, etc.) are temporarily supported for the older UI. Prefer /api/ocr/... going forward.
Synthetic API (FastAPI)
POST /api/synthetic/generate- Modes:
single,comprehensive,ultra-realistic,huggingface - Request body examples:
- Non-HF:
{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" } - HF CSV:
{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }
- Non-HF:
- Response:
- Non-HF:
{ "status": "ok", "output_dir": "/static/synthetic/<job_id>" } - HF:
{ "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }
- Non-HF:
- Outputs are stored under
backend/data/synth_outputs/<job_id>/and publicly served at/static/synthetic/<job_id>/....
- Modes:
Fonts
- Generator uses fonts from
content/static/. - Default:
NotoSansOriya_Condensed-Regular.ttf(configurable). Ensure the directory exists.
Effects & Styles
- Paper styles: lined paper, old paper, birch, parchment
- Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps
Notes
- The backend expects the Gemini API key to be provided per-request to
/api/ocr/process. Do not hardcode keys server-side. - For HuggingFace datasets, the backend uses
datasetswhen possible, or downloads raw CSV URLs. - You can browse generated outputs via the links returned by
/api/synthetic/generate.
Deploy to Hugging Face Spaces (Docker)
This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.
Steps:
- Create a new Space → Type: Docker
- Push this repository to the Space
- In Space Settings:
- Enable Persistent Storage
- (Optional) Add Secrets/Env Vars as needed, e.g.,
DATA_DIR=/data(default already) andFRONTEND_DIST=/app/frontend_dist
- The container exposes port
7860by default.
What the image does:
- Builds the frontend (
frontend/) and copies thedist/to/app/frontend_dist - Installs backend dependencies and runs
uvicorn app.main:appfrombackend/ - Serves:
- API at
/api/... - Uploaded images at
/images - Synthetic outputs at
/static/synthetic - Frontend SPA at
/(served from/app/frontend_dist)
- API at
- Paper Textures: Realistic paper fiber patterns using Perlin noise
- Aging Effects: Edge darkening and aging patterns
- Physical Damage: Fold lines, creases, and ink bleeding
- Scanner Artifacts: Dust, compression artifacts, scanning lines
- Geometric Distortions: Perspective changes, cylindrical warping
- Lighting Effects: Shadows and lens distortions
Font Requirements
The generator requires appropriate fonts for text rendering. Default configuration expects:
- Font directory:
/content/static/ - Font file:
NotoSansOriya_ExtraCondensed-Regular.ttf
You can specify custom fonts using --font-dir and --font parameters.
Performance Tips
- Use
--max-samplesto limit processing for large datasets - Disable advanced effects with
--no-advanced-effectsfor faster generation - Use multiprocessing with
--use-multiprocessingfor batch jobs - Adjust image dimensions to balance quality and speed
Error Handling
The package includes comprehensive error handling:
- Graceful fallbacks for missing dependencies
- Detailed logging for debugging
- Validation of input parameters
- Safe handling of malformed datasets
Contributing
The modular structure makes it easy to extend:
- Add new effects in
effects.py - Implement new background styles in
backgrounds.py - Create custom transformations in
transformations.py - Extend dataset processing in
huggingface_processor.py
License
[Add your license information here]
Note: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.