Calligrapher: Freestyle Text Image Customization

📄 Project Page | 📦 Code | 🎥 Video

🎯 Overview

Calligrapher is a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Our framework supports text customization under various settings including self-reference, cross-reference, and non-text reference customization.

✨ Key Features

🎨 Freestyle Text Customization: Generate text with diverse stylized images and text prompts
🔄 Various Reference Modes: Support for self-reference, cross-reference, and non-text reference customization
🚀 High-Quality Results: Photorealistic text image customization with consistent typography

📦 Repository Contents

This Hugging Face repository contains:

calligrapher.bin: Pre-trained Calligrapher model weights.
Calligrapher_bench_testing.zip: Comprehensive test dataset with examples for both self-reference and cross-reference customization scenarios with additional reference images for testing, omitting a small portion of samples due to IP concerns.

🛠️ Quick Start

Installation

We provide two ways to set up the environment (requiring Python 3.10 + PyTorch 2.5.0 + CUDA):

Using pip

# Clone the repository
git clone https://github.com/Calligrapher2025/Calligrapher.git
cd Calligrapher

# Install dependencies
pip install -r requirements.txt

Using Conda

# Clone the repository
git clone https://github.com/Calligrapher2025/Calligrapher.git
cd Calligrapher

# Create and activate conda environment
conda env create -f env.yml
conda activate calligrapher

Download Models & Testing Data

from huggingface_hub import snapshot_download

# Download Calligrapher model and test data
snapshot_download("Calligrapher2025/Calligrapher")
# Download required base models (granted access needed for FLUX.1-Fill)
snapshot_download("black-forest-labs/FLUX.1-Fill-dev", token="your_token")
snapshot_download("google/siglip-so400m-patch14-384")

Configuration

Before running the models, you need to configure the paths in path_dict.json:

{
  "data_dir": "path/to/Calligrapher_bench_testing",
  "cli_save_dir": "path/to/cli_results",
  "gradio_save_dir": "path/to/gradio_results",
  "gradio_temp_dir": "path/to/gradio_tmp",
  "base_model_path": "path/to/FLUX.1-Fill-dev",
  "image_encoder_path": "path/to/siglip-so400m-patch14-384",
  "calligrapher_path": "path/to/calligrapher.bin"
}

Configuration parameters:

data_dir: Path to store the test dataset
cli_save_dir: Path to save results from command-line interface experiments
gradio_save_dir: Path to save results from Gradio interface experiments
gradio_temp_dir: Path to save Gradio temporary files
base_model_path: Path to the base model FLUX.1-Fill-dev
image_encoder_path: Path to the SigLIP image encoder model
calligrapher_path: Path to the Calligrapher model weights

Run Gradio Demo

# Basic Gradio demo
python gradio_demo.py

# PLEASE consider trying examples here first - demo with custom mask upload (recommended for first-time users)
# This version includes pre-configured examples and is RECOMMENDED for users to first understand how to use the model
python gradio_demo_upload_mask.py

Below is a preview of the Gradio demo interfaces:

✨User Tips:

Speed vs Quality Trade-off. Use fewer steps (e.g., 10-step which takes ~4s/image on a single A6000 GPU) for faster generation, but quality may be lower.
Inpaint Position Freedom. Inpainting positions are flexible - they don't necessarily need to match the original text locations in the input image.
Iterative Editing. Drag outputs from the gallery to the Image Editing Panel (clean the Editing Panel first) for quick refinements.
Mask Optimization. Adjust mask size/aspect ratio to match your desired content. The model tends to fill the masks, and harmonizes the generation with background in terms of color and lighting.
Reference Image Tip. White-background references improve style consistency - the encoder also considers background context of the given reference image.
Resolution Balance. Very high-resolution generation sometimes triggers spelling errors. 512/768px are recommended considering the model is trained under the resolution of 512.

🎨 Command Line Usage Examples

Self-reference Customization

python infer_calligrapher_self_custom.py

Cross-reference Customization

python infer_calligrapher_cross_custom.py

Note: Image result files starting with "result" are the customization outputs, while files starting with "vis_result" are concatenated results showing the source image, reference image, and model output together.

📊 Framework

Our framework integrates localized style injection and diffusion-based learning, featuring:

Self-distillation mechanism for automatic typography benchmark construction.
Localized style injection via trainable style encoder.
In-context generation for enhanced style alignment.

🎭 Results Gallery

Calligrapher2025
/

Calligrapher