Spaces:
Running
Running
title: Pose Preserving Comicfier | |
emoji: 🤠🎞️ | |
colorFrom: green | |
colorTo: green | |
sdk: gradio | |
sdk_version: 5.29.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
short_description: 'Comicfier: Transforms photos into retro Western comic style' | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# Pose-Preserving Comicfier - Gradio App | |
[](https://huggingface.co/Mer-o)(https://huggingface.co/spaces/Mer-o/Pose-Preserving-Comicfier) | |
This repository contains the code for a Gradio web application that transforms input images into a specific retro Western comic book style while preserving the original pose. It uses Stable Diffusion v1.5, ControlNet (OpenPose + Tile), and specific LoRAs. | |
This application refactors the workflow initially developed in a [Kaggle Notebook](https://github.com/mehran-khani/SD-Controlnet-Comic-Styler) into a deployable web app. | |
## Features | |
* **Pose Preservation:** Uses ControlNet OpenPose to accurately maintain the pose from the input image. | |
* **Retro Comic Style Transfer:** Applies specific LoRAs (`night_comic_V06.safetensors` & `add_detail.safetensors`) for a 1940s Western comic aesthetic with enhanced details. | |
* **Tiling Upscaling:** Implements ControlNet Tile for 2x high-resolution output (1024x1024), improving detail consistency over large images. | |
* **Simplified UI:** Easy-to-use interface with only an image upload and generate button. | |
* **Fixed Parameters:** Generation uses pre-defined, optimized parameters (steps, guidance, strength, prompts) based on the original notebook implementation for consistent results. | |
* **Dynamic Backgrounds:** The background elements in the generated image are randomized for variety in the low-resolution stage. | |
* **Broad Image Support:** Accepts common formats like JPG, PNG, WEBP, and HEIC (requires `pillow-heif`). | |
## Technology Stack | |
* **Python 3** | |
* **Gradio:** Web UI framework. | |
* **PyTorch:** Core ML framework. | |
* **Hugging Face Libraries:** | |
* `diffusers`: Stable Diffusion pipelines, ControlNet integration. | |
* `transformers`: Underlying model components. | |
* `accelerate`: Hardware acceleration utilities. | |
* `peft`: LoRA loading and management. | |
* **ControlNet:** | |
* OpenPose Detector (`controlnet_aux`) | |
* OpenPose ControlNet Model (`lllyasviel/sd-controlnet-openpose`) | |
* Tile ControlNet Model (`lllyasviel/control_v11f1e_sd15_tile`) | |
* **Base Model:** `runwayml/stable-diffusion-v1-5` | |
* **LoRAs Used:** | |
* Style: [Western Comics Style](https://civitai.com/models/1081588/western-comics-style) (using `night_comic_V06.safetensors`) | |
* Detail: [Detail Tweaker LoRA](https://civitai.com/models/58390/detail-tweaker-lora-lora) (using `add_detail.safetensors`) | |
* **Image Processing:** `Pillow`, `pillow-heif`, `numpy`, `opencv-python-headless` | |
* **Dependencies:** `matplotlib`, `mediapipe` (required by `controlnet_aux`) | |
## Workflow Overview | |
1. **Image Preparation (`image_utils.py`):** Input image is loaded (supports HEIC), converted to RGB, EXIF data handled, and force-resized to 512x512. | |
2. **Pose Detection (`pipelines.py`):** An OpenPose map is extracted from the resized image using `controlnet_aux`. | |
3. **Low-Resolution Generation (`pipelines.py`):** | |
* An SDv1.5 Img2Img pipeline with Pose ControlNet is dynamically loaded. | |
* Prompts are generated (`prompts.py`) with a fixed base/style and a *randomized* background element. | |
* Style and Detail LoRAs are applied. | |
* A 512x512 image is generated using fixed parameters. | |
* The pipeline is unloaded to conserve VRAM. | |
4. **High-Resolution Tiling (`pipelines.py`):** | |
* The 512x512 image is upscaled 2x (to 1024x1024) using bicubic interpolation (creating a blurry base). | |
* An SDv1.5 Img2Img pipeline with Tile ControlNet is dynamically loaded. | |
* Tile-specific prompts (excluding the random background) are used. | |
* Style and Detail LoRAs are applied (potentially with different weights). | |
* The image is processed in overlapping 1024x1024 tiles. | |
* Processed tiles are blended back together using an alpha mask (`image_utils.py`). | |
* The pipeline is unloaded. | |
5. **Output (`app.py`):** The final 1024x1024 image is displayed in the Gradio UI. | |
## How to Run Locally | |
*(Requires sufficient RAM/CPU or compatible GPU, Python 3.8+, and Git)* | |
1. **Clone the repository:** | |
```bash | |
git clone https://github.com/mehran-khani/Pose-Preserving-Comicfier.git | |
cd Pose-Preserving-Comicfier | |
``` | |
2. **Create and activate a Python virtual environment:** | |
```bash | |
python3 -m venv .venv | |
source .venv/bin/activate | |
# .\.venv\Scripts\Activate.ps1 | |
# .\.venv\Scripts\activate.bat | |
``` | |
3. **Install dependencies:** | |
```bash | |
pip install -r requirements.txt | |
``` | |
*(Note: PyTorch installation might require specific commands depending on your OS/CUDA setup if using a local GPU. See PyTorch website.)* | |
4. **Download LoRA files:** | |
* Create a folder named `loras` in the project root. | |
* Download `night_comic_V06.safetensors` (from Civitai link above) and place it in the `loras` folder. | |
* Download `add_detail.safetensors` (from Civitai link above) and place it in the `loras` folder. | |
5. **Run the Gradio app:** | |
```bash | |
python app.py | |
``` | |
6. Open the local URL provided (e.g., `http://127.0.0.1:7860`) in your browser. *(Note: Execution will be very slow without a suitable GPU).* | |
## Deployment to Hugging Face Spaces | |
This app is designed for deployment on Hugging Face Spaces, ideally with GPU hardware. | |
1. Ensure all code (`*.py`), `requirements.txt`, `.gitignore`, and the `loras` folder (containing the `.safetensors` files) are committed and pushed to this GitHub repository. | |
2. Create a new Space on Hugging Face ([huggingface.co/new-space](https://huggingface.co/new-space)). | |
3. Choose an owner, Space name, and select "Gradio" as the Space SDK. | |
4. Select desired hardware (e.g., "T4 small" under GPU options). Note compute costs may apply. | |
5. Choose "Use existing GitHub repository". | |
6. Enter the URL of this GitHub repository. | |
7. Click "Create Space". The Space will build the environment from `requirements.txt` and run `app.py`. Monitor the build and runtime logs for any issues. | |
## Limitations | |
* **Speed:** Generation requires significant time (minutes), especially on shared/free GPU hardware, due to the multi-stage process and dynamic model loading between stages. CPU execution is impractically slow. | |
* **VRAM:** While optimized with dynamic pipeline unloading, the process still requires considerable GPU VRAM (>10GB peak). Out-of-memory errors are possible on lower-VRAM GPUs. | |
* **Fixed Style:** The artistic style (prompts, LoRAs, parameters) is fixed in the code to replicate the notebook's specific output and cannot be changed via the UI. | |
## License | |
MIT License |