Spaces:
Running
Running
metadata
title: Pose Preserving Comicfier
emoji: 🤠🎞️
colorFrom: green
colorTo: green
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: mit
short_description: 'Comicfier: Transforms photos into retro Western comic style'
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Pose-Preserving Comicfier - Gradio App
(https://huggingface.co/spaces/Mer-o/Pose-Preserving-Comicfier)
This repository contains the code for a Gradio web application that transforms input images into a specific retro Western comic book style while preserving the original pose. It uses Stable Diffusion v1.5, ControlNet (OpenPose + Tile), and specific LoRAs.
This application refactors the workflow initially developed in a Kaggle Notebook into a deployable web app.
Features
- Pose Preservation: Uses ControlNet OpenPose to accurately maintain the pose from the input image.
- Retro Comic Style Transfer: Applies specific LoRAs (
night_comic_V06.safetensors
&add_detail.safetensors
) for a 1940s Western comic aesthetic with enhanced details. - Tiling Upscaling: Implements ControlNet Tile for 2x high-resolution output (1024x1024), improving detail consistency over large images.
- Simplified UI: Easy-to-use interface with only an image upload and generate button.
- Fixed Parameters: Generation uses pre-defined, optimized parameters (steps, guidance, strength, prompts) based on the original notebook implementation for consistent results.
- Dynamic Backgrounds: The background elements in the generated image are randomized for variety in the low-resolution stage.
- Broad Image Support: Accepts common formats like JPG, PNG, WEBP, and HEIC (requires
pillow-heif
).
Technology Stack
- Python 3
- Gradio: Web UI framework.
- PyTorch: Core ML framework.
- Hugging Face Libraries:
diffusers
: Stable Diffusion pipelines, ControlNet integration.transformers
: Underlying model components.accelerate
: Hardware acceleration utilities.peft
: LoRA loading and management.
- ControlNet:
- OpenPose Detector (
controlnet_aux
) - OpenPose ControlNet Model (
lllyasviel/sd-controlnet-openpose
) - Tile ControlNet Model (
lllyasviel/control_v11f1e_sd15_tile
)
- OpenPose Detector (
- Base Model:
runwayml/stable-diffusion-v1-5
- LoRAs Used:
- Style: Western Comics Style (using
night_comic_V06.safetensors
) - Detail: Detail Tweaker LoRA (using
add_detail.safetensors
)
- Style: Western Comics Style (using
- Image Processing:
Pillow
,pillow-heif
,numpy
,opencv-python-headless
- Dependencies:
matplotlib
,mediapipe
(required bycontrolnet_aux
)
Workflow Overview
- Image Preparation (
image_utils.py
): Input image is loaded (supports HEIC), converted to RGB, EXIF data handled, and force-resized to 512x512. - Pose Detection (
pipelines.py
): An OpenPose map is extracted from the resized image usingcontrolnet_aux
. - Low-Resolution Generation (
pipelines.py
):- An SDv1.5 Img2Img pipeline with Pose ControlNet is dynamically loaded.
- Prompts are generated (
prompts.py
) with a fixed base/style and a randomized background element. - Style and Detail LoRAs are applied.
- A 512x512 image is generated using fixed parameters.
- The pipeline is unloaded to conserve VRAM.
- High-Resolution Tiling (
pipelines.py
):- The 512x512 image is upscaled 2x (to 1024x1024) using bicubic interpolation (creating a blurry base).
- An SDv1.5 Img2Img pipeline with Tile ControlNet is dynamically loaded.
- Tile-specific prompts (excluding the random background) are used.
- Style and Detail LoRAs are applied (potentially with different weights).
- The image is processed in overlapping 1024x1024 tiles.
- Processed tiles are blended back together using an alpha mask (
image_utils.py
). - The pipeline is unloaded.
- Output (
app.py
): The final 1024x1024 image is displayed in the Gradio UI.
How to Run Locally
(Requires sufficient RAM/CPU or compatible GPU, Python 3.8+, and Git)
- Clone the repository:
git clone https://github.com/mehran-khani/Pose-Preserving-Comicfier.git cd Pose-Preserving-Comicfier
- Create and activate a Python virtual environment:
python3 -m venv .venv source .venv/bin/activate # .\.venv\Scripts\Activate.ps1 # .\.venv\Scripts\activate.bat
- Install dependencies:
(Note: PyTorch installation might require specific commands depending on your OS/CUDA setup if using a local GPU. See PyTorch website.)pip install -r requirements.txt
- Download LoRA files:
- Create a folder named
loras
in the project root. - Download
night_comic_V06.safetensors
(from Civitai link above) and place it in theloras
folder. - Download
add_detail.safetensors
(from Civitai link above) and place it in theloras
folder.
- Create a folder named
- Run the Gradio app:
python app.py
- Open the local URL provided (e.g.,
http://127.0.0.1:7860
) in your browser. (Note: Execution will be very slow without a suitable GPU).
Deployment to Hugging Face Spaces
This app is designed for deployment on Hugging Face Spaces, ideally with GPU hardware.
- Ensure all code (
*.py
),requirements.txt
,.gitignore
, and theloras
folder (containing the.safetensors
files) are committed and pushed to this GitHub repository. - Create a new Space on Hugging Face (huggingface.co/new-space).
- Choose an owner, Space name, and select "Gradio" as the Space SDK.
- Select desired hardware (e.g., "T4 small" under GPU options). Note compute costs may apply.
- Choose "Use existing GitHub repository".
- Enter the URL of this GitHub repository.
- Click "Create Space". The Space will build the environment from
requirements.txt
and runapp.py
. Monitor the build and runtime logs for any issues.
Limitations
- Speed: Generation requires significant time (minutes), especially on shared/free GPU hardware, due to the multi-stage process and dynamic model loading between stages. CPU execution is impractically slow.
- VRAM: While optimized with dynamic pipeline unloading, the process still requires considerable GPU VRAM (>10GB peak). Out-of-memory errors are possible on lower-VRAM GPUs.
- Fixed Style: The artistic style (prompts, LoRAs, parameters) is fixed in the code to replicate the notebook's specific output and cannot be changed via the UI.
License
MIT License