Spaces:

Sk4467
/

odia_ocr_annotation_synthetic_app

Sleeping

File size: 6,849 Bytes

---
title: Odia OCR Annotation + Synthetic Generator
emoji: 🧩
colorFrom: indigo
colorTo: yellow
sdk: docker
sdk_version: "1.0.0"
pinned: false
---

# Odia OCR Annotation + Synthetic Text Generator

A unified repository that provides:
- An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
- A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.

## Repository Structure

- `backend/`
  - `app/main.py`: FastAPI app with two routers: `/api/ocr` and `/api/synthetic`
  - `app/api/routers/ocr.py`: OCR endpoints (upload, OCR, annotations import/export)
  - `app/api/routers/synthetic.py`: Synthetic generation endpoints
  - `app/services/`: Shared services
    - `ocr_processor.py`: Gemini OCR
    - `annotations.py`: CSV/JSON I/O
    - `synthetic/`: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
  - `data/`: runtime storage
    - `uploaded_images/`: uploaded images (served at `/images`)
    - `annotations/`: `annotations.csv` and JSON
    - `synth_outputs/`: generated images and CSVs (served at `/static/synthetic`)
  - `requirements.txt`: backend dependencies
- `frontend/`
  - Vite + React + Tailwind app
  - Routes: `/ocr` (annotation UI) and `/synthetic` (generator UI)
- `content/static/`: NotoSans Oriya fonts used by generator

## Run Locally

1) Backend
- `pip install -r backend/requirements.txt`
- From `backend/`: `uvicorn app.main:app --reload`
- Static mounts:
  - `/images` → `backend/data/uploaded_images`
  - `/static/synthetic` → `backend/data/synth_outputs`

2) Frontend
- `cd frontend && npm install && npm run dev`
- Open `http://localhost:5173`
- Use navigation to switch between OCR and Synthetic pages

## OCR API (FastAPI)

- `POST /api/ocr/upload`:
  - Multipart files field: `files`
  - Stores images in `backend/data/uploaded_images`
- `POST /api/ocr/process`:
  - JSON: `{ "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }`
  - Returns: `{ "img1.png": "extracted text", ... }`
- `GET /api/ocr/annotations`:
  - Returns current annotations, valid/missing images
- `POST /api/ocr/save`:
  - JSON: `{ "<filename>": { "extracted_text": "...", "validated_text": "..." } }`
  - Saves to CSV and JSON in `backend/data/annotations`
- `POST /api/ocr/import`:
  - Multipart: `file` (CSV), `image_folder` (e.g., `uploaded_images`)
  - Validates and returns annotations + image presence
- `POST /api/ocr/export`:
  - JSON: `{ annotations: {...}, validated_texts: {...} }`
  - Returns a downloadable CSV

Note: Legacy endpoints (`/upload/`, `/process-ocr/`, etc.) are temporarily supported for the older UI. Prefer `/api/ocr/...` going forward.

## Synthetic API (FastAPI)

- `POST /api/synthetic/generate`
  - Modes: `single`, `comprehensive`, `ultra-realistic`, `huggingface`
  - Request body examples:
    - Non-HF:
      `{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }`
    - HF CSV:
      `{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }`
  - Response:
    - Non-HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>" }`
    - HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }`
  - Outputs are stored under `backend/data/synth_outputs/<job_id>/` and publicly served at `/static/synthetic/<job_id>/...`.

## Fonts

- Generator uses fonts from `content/static/`.
- Default: `NotoSansOriya_Condensed-Regular.ttf` (configurable). Ensure the directory exists.

## Effects & Styles

- Paper styles: lined paper, old paper, birch, parchment
- Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps

## Notes

- The backend expects the Gemini API key to be provided per-request to `/api/ocr/process`. Do not hardcode keys server-side.
- For HuggingFace datasets, the backend uses `datasets` when possible, or downloads raw CSV URLs.
- You can browse generated outputs via the links returned by `/api/synthetic/generate`.

## Deploy to Hugging Face Spaces (Docker)

This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.

Steps:
- Create a new Space → Type: Docker
- Push this repository to the Space
- In Space Settings:
  - Enable Persistent Storage
  - (Optional) Add Secrets/Env Vars as needed, e.g., `DATA_DIR=/data` (default already) and `FRONTEND_DIST=/app/frontend_dist`
- The container exposes port `7860` by default.

What the image does:
- Builds the frontend (`frontend/`) and copies the `dist/` to `/app/frontend_dist`
- Installs backend dependencies and runs `uvicorn app.main:app` from `backend/`
- Serves:
  - API at `/api/...`
  - Uploaded images at `/images`
  - Synthetic outputs at `/static/synthetic`
  - Frontend SPA at `/` (served from `/app/frontend_dist`)


1. **Paper Textures**: Realistic paper fiber patterns using Perlin noise
2. **Aging Effects**: Edge darkening and aging patterns
3. **Physical Damage**: Fold lines, creases, and ink bleeding
4. **Scanner Artifacts**: Dust, compression artifacts, scanning lines
5. **Geometric Distortions**: Perspective changes, cylindrical warping
6. **Lighting Effects**: Shadows and lens distortions

## Font Requirements

The generator requires appropriate fonts for text rendering. Default configuration expects:
- Font directory: `/content/static/`
- Font file: `NotoSansOriya_ExtraCondensed-Regular.ttf`

You can specify custom fonts using `--font-dir` and `--font` parameters.

## Performance Tips

- Use `--max-samples` to limit processing for large datasets
- Disable advanced effects with `--no-advanced-effects` for faster generation
- Use multiprocessing with `--use-multiprocessing` for batch jobs
- Adjust image dimensions to balance quality and speed

## Error Handling

The package includes comprehensive error handling:
- Graceful fallbacks for missing dependencies
- Detailed logging for debugging
- Validation of input parameters
- Safe handling of malformed datasets

## Contributing

The modular structure makes it easy to extend:
- Add new effects in `effects.py`
- Implement new background styles in `backgrounds.py`
- Create custom transformations in `transformations.py`
- Extend dataset processing in `huggingface_processor.py`

## License

[Add your license information here]

---

**Note**: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.