File size: 6,849 Bytes
1ac44a6 82e5025 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
---
title: Odia OCR Annotation + Synthetic Generator
emoji: 🧩
colorFrom: indigo
colorTo: yellow
sdk: docker
sdk_version: "1.0.0"
pinned: false
---
# Odia OCR Annotation + Synthetic Text Generator
A unified repository that provides:
- An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
- A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.
## Repository Structure
- `backend/`
- `app/main.py`: FastAPI app with two routers: `/api/ocr` and `/api/synthetic`
- `app/api/routers/ocr.py`: OCR endpoints (upload, OCR, annotations import/export)
- `app/api/routers/synthetic.py`: Synthetic generation endpoints
- `app/services/`: Shared services
- `ocr_processor.py`: Gemini OCR
- `annotations.py`: CSV/JSON I/O
- `synthetic/`: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
- `data/`: runtime storage
- `uploaded_images/`: uploaded images (served at `/images`)
- `annotations/`: `annotations.csv` and JSON
- `synth_outputs/`: generated images and CSVs (served at `/static/synthetic`)
- `requirements.txt`: backend dependencies
- `frontend/`
- Vite + React + Tailwind app
- Routes: `/ocr` (annotation UI) and `/synthetic` (generator UI)
- `content/static/`: NotoSans Oriya fonts used by generator
## Run Locally
1) Backend
- `pip install -r backend/requirements.txt`
- From `backend/`: `uvicorn app.main:app --reload`
- Static mounts:
- `/images` → `backend/data/uploaded_images`
- `/static/synthetic` → `backend/data/synth_outputs`
2) Frontend
- `cd frontend && npm install && npm run dev`
- Open `http://localhost:5173`
- Use navigation to switch between OCR and Synthetic pages
## OCR API (FastAPI)
- `POST /api/ocr/upload`:
- Multipart files field: `files`
- Stores images in `backend/data/uploaded_images`
- `POST /api/ocr/process`:
- JSON: `{ "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }`
- Returns: `{ "img1.png": "extracted text", ... }`
- `GET /api/ocr/annotations`:
- Returns current annotations, valid/missing images
- `POST /api/ocr/save`:
- JSON: `{ "<filename>": { "extracted_text": "...", "validated_text": "..." } }`
- Saves to CSV and JSON in `backend/data/annotations`
- `POST /api/ocr/import`:
- Multipart: `file` (CSV), `image_folder` (e.g., `uploaded_images`)
- Validates and returns annotations + image presence
- `POST /api/ocr/export`:
- JSON: `{ annotations: {...}, validated_texts: {...} }`
- Returns a downloadable CSV
Note: Legacy endpoints (`/upload/`, `/process-ocr/`, etc.) are temporarily supported for the older UI. Prefer `/api/ocr/...` going forward.
## Synthetic API (FastAPI)
- `POST /api/synthetic/generate`
- Modes: `single`, `comprehensive`, `ultra-realistic`, `huggingface`
- Request body examples:
- Non-HF:
`{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }`
- HF CSV:
`{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }`
- Response:
- Non-HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>" }`
- HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }`
- Outputs are stored under `backend/data/synth_outputs/<job_id>/` and publicly served at `/static/synthetic/<job_id>/...`.
## Fonts
- Generator uses fonts from `content/static/`.
- Default: `NotoSansOriya_Condensed-Regular.ttf` (configurable). Ensure the directory exists.
## Effects & Styles
- Paper styles: lined paper, old paper, birch, parchment
- Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps
## Notes
- The backend expects the Gemini API key to be provided per-request to `/api/ocr/process`. Do not hardcode keys server-side.
- For HuggingFace datasets, the backend uses `datasets` when possible, or downloads raw CSV URLs.
- You can browse generated outputs via the links returned by `/api/synthetic/generate`.
## Deploy to Hugging Face Spaces (Docker)
This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.
Steps:
- Create a new Space → Type: Docker
- Push this repository to the Space
- In Space Settings:
- Enable Persistent Storage
- (Optional) Add Secrets/Env Vars as needed, e.g., `DATA_DIR=/data` (default already) and `FRONTEND_DIST=/app/frontend_dist`
- The container exposes port `7860` by default.
What the image does:
- Builds the frontend (`frontend/`) and copies the `dist/` to `/app/frontend_dist`
- Installs backend dependencies and runs `uvicorn app.main:app` from `backend/`
- Serves:
- API at `/api/...`
- Uploaded images at `/images`
- Synthetic outputs at `/static/synthetic`
- Frontend SPA at `/` (served from `/app/frontend_dist`)
1. **Paper Textures**: Realistic paper fiber patterns using Perlin noise
2. **Aging Effects**: Edge darkening and aging patterns
3. **Physical Damage**: Fold lines, creases, and ink bleeding
4. **Scanner Artifacts**: Dust, compression artifacts, scanning lines
5. **Geometric Distortions**: Perspective changes, cylindrical warping
6. **Lighting Effects**: Shadows and lens distortions
## Font Requirements
The generator requires appropriate fonts for text rendering. Default configuration expects:
- Font directory: `/content/static/`
- Font file: `NotoSansOriya_ExtraCondensed-Regular.ttf`
You can specify custom fonts using `--font-dir` and `--font` parameters.
## Performance Tips
- Use `--max-samples` to limit processing for large datasets
- Disable advanced effects with `--no-advanced-effects` for faster generation
- Use multiprocessing with `--use-multiprocessing` for batch jobs
- Adjust image dimensions to balance quality and speed
## Error Handling
The package includes comprehensive error handling:
- Graceful fallbacks for missing dependencies
- Detailed logging for debugging
- Validation of input parameters
- Safe handling of malformed datasets
## Contributing
The modular structure makes it easy to extend:
- Add new effects in `effects.py`
- Implement new background styles in `backgrounds.py`
- Create custom transformations in `transformations.py`
- Extend dataset processing in `huggingface_processor.py`
## License
[Add your license information here]
---
**Note**: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.
|