File size: 6,849 Bytes
1ac44a6
 
 
 
 
 
 
 
 
 
82e5025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
title: Odia OCR Annotation + Synthetic Generator
emoji: 🧩
colorFrom: indigo
colorTo: yellow
sdk: docker
sdk_version: "1.0.0"
pinned: false
---

# Odia OCR Annotation + Synthetic Text Generator

A unified repository that provides:
- An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
- A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.

## Repository Structure

- `backend/`
  - `app/main.py`: FastAPI app with two routers: `/api/ocr` and `/api/synthetic`
  - `app/api/routers/ocr.py`: OCR endpoints (upload, OCR, annotations import/export)
  - `app/api/routers/synthetic.py`: Synthetic generation endpoints
  - `app/services/`: Shared services
    - `ocr_processor.py`: Gemini OCR
    - `annotations.py`: CSV/JSON I/O
    - `synthetic/`: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
  - `data/`: runtime storage
    - `uploaded_images/`: uploaded images (served at `/images`)
    - `annotations/`: `annotations.csv` and JSON
    - `synth_outputs/`: generated images and CSVs (served at `/static/synthetic`)
  - `requirements.txt`: backend dependencies
- `frontend/`
  - Vite + React + Tailwind app
  - Routes: `/ocr` (annotation UI) and `/synthetic` (generator UI)
- `content/static/`: NotoSans Oriya fonts used by generator

## Run Locally

1) Backend
- `pip install -r backend/requirements.txt`
- From `backend/`: `uvicorn app.main:app --reload`
- Static mounts:
  - `/images``backend/data/uploaded_images`
  - `/static/synthetic``backend/data/synth_outputs`

2) Frontend
- `cd frontend && npm install && npm run dev`
- Open `http://localhost:5173`
- Use navigation to switch between OCR and Synthetic pages

## OCR API (FastAPI)

- `POST /api/ocr/upload`:
  - Multipart files field: `files`
  - Stores images in `backend/data/uploaded_images`
- `POST /api/ocr/process`:
  - JSON: `{ "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }`
  - Returns: `{ "img1.png": "extracted text", ... }`
- `GET /api/ocr/annotations`:
  - Returns current annotations, valid/missing images
- `POST /api/ocr/save`:
  - JSON: `{ "<filename>": { "extracted_text": "...", "validated_text": "..." } }`
  - Saves to CSV and JSON in `backend/data/annotations`
- `POST /api/ocr/import`:
  - Multipart: `file` (CSV), `image_folder` (e.g., `uploaded_images`)
  - Validates and returns annotations + image presence
- `POST /api/ocr/export`:
  - JSON: `{ annotations: {...}, validated_texts: {...} }`
  - Returns a downloadable CSV

Note: Legacy endpoints (`/upload/`, `/process-ocr/`, etc.) are temporarily supported for the older UI. Prefer `/api/ocr/...` going forward.

## Synthetic API (FastAPI)

- `POST /api/synthetic/generate`
  - Modes: `single`, `comprehensive`, `ultra-realistic`, `huggingface`
  - Request body examples:
    - Non-HF:
      `{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }`
    - HF CSV:
      `{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }`
  - Response:
    - Non-HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>" }`
    - HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }`
  - Outputs are stored under `backend/data/synth_outputs/<job_id>/` and publicly served at `/static/synthetic/<job_id>/...`.

## Fonts

- Generator uses fonts from `content/static/`.
- Default: `NotoSansOriya_Condensed-Regular.ttf` (configurable). Ensure the directory exists.

## Effects & Styles

- Paper styles: lined paper, old paper, birch, parchment
- Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps

## Notes

- The backend expects the Gemini API key to be provided per-request to `/api/ocr/process`. Do not hardcode keys server-side.
- For HuggingFace datasets, the backend uses `datasets` when possible, or downloads raw CSV URLs.
- You can browse generated outputs via the links returned by `/api/synthetic/generate`.

## Deploy to Hugging Face Spaces (Docker)

This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.

Steps:
- Create a new Space → Type: Docker
- Push this repository to the Space
- In Space Settings:
  - Enable Persistent Storage
  - (Optional) Add Secrets/Env Vars as needed, e.g., `DATA_DIR=/data` (default already) and `FRONTEND_DIST=/app/frontend_dist`
- The container exposes port `7860` by default.

What the image does:
- Builds the frontend (`frontend/`) and copies the `dist/` to `/app/frontend_dist`
- Installs backend dependencies and runs `uvicorn app.main:app` from `backend/`
- Serves:
  - API at `/api/...`
  - Uploaded images at `/images`
  - Synthetic outputs at `/static/synthetic`
  - Frontend SPA at `/` (served from `/app/frontend_dist`)


1. **Paper Textures**: Realistic paper fiber patterns using Perlin noise
2. **Aging Effects**: Edge darkening and aging patterns
3. **Physical Damage**: Fold lines, creases, and ink bleeding
4. **Scanner Artifacts**: Dust, compression artifacts, scanning lines
5. **Geometric Distortions**: Perspective changes, cylindrical warping
6. **Lighting Effects**: Shadows and lens distortions

## Font Requirements

The generator requires appropriate fonts for text rendering. Default configuration expects:
- Font directory: `/content/static/`
- Font file: `NotoSansOriya_ExtraCondensed-Regular.ttf`

You can specify custom fonts using `--font-dir` and `--font` parameters.

## Performance Tips

- Use `--max-samples` to limit processing for large datasets
- Disable advanced effects with `--no-advanced-effects` for faster generation
- Use multiprocessing with `--use-multiprocessing` for batch jobs
- Adjust image dimensions to balance quality and speed

## Error Handling

The package includes comprehensive error handling:
- Graceful fallbacks for missing dependencies
- Detailed logging for debugging
- Validation of input parameters
- Safe handling of malformed datasets

## Contributing

The modular structure makes it easy to extend:
- Add new effects in `effects.py`
- Implement new background styles in `backgrounds.py`
- Create custom transformations in `transformations.py`
- Extend dataset processing in `huggingface_processor.py`

## License

[Add your license information here]

---

**Note**: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.