AI Image Processing Toolkit


A collection of specialized scripts for AI image processing, dataset preparation, and model training workflows.

πŸ› οΈ Scripts Overview


wdv3

An image tagging script using the WD V3 tagger models by SmilingWolf based on this repo. Supports multiple model architectures (ViT, SwinV2, ConvNext) and can process both single images and directories recursively.

Features

  • Multiple model architecture support
  • Batch processing capabilities
  • Adjustable confidence thresholds
  • CUDA acceleration with FP16 support
  • JXL image format support

train_functions

A set of ZSH functions for managing AI model training workflows:

  • Script execution management
  • Training variable setup
  • Git repository state tracking
  • Output directory management
  • Automatic cleanup of empty outputs

git-wrapper

Enhanced Git functionality for dataset management:

  • Automatic submodule handling
  • LFS integration for JXL files
  • Dataset-specific Git attributes management

check4sig

Dataset caption file watermark detection utility:

  • Scans .caption files for watermark-related text
  • Batch processing support
  • Interactive editing with nvim
  • Recursive directory scanning

gallery-dl

Directory-aware wrapper for gallery-dl:

  • Automatically changes to ~/datasets directory
  • Maintains consistent download locations
  • Preserves original command functionality

joy

Advanced image captioning system by fancyfeast called JoyCaption using CLIP and LLM

  • Multiple caption styles (descriptive, training prompts, art critic, etc.)
  • Custom image adapters
  • Tag-based caption generation
  • Batch processing support

png2mp4

The png2mp4 script converts sequences of PNG images into MP4 videos, particularly useful for visualizing training progress or creating animation sequences. It supports multiple sample sets and various customization options.

Basic usage (repeat each frame 16 times):

png2mp4 --repeat 16

Automatically detect step size from filenames:

png2mp4 --steps-from-filename --repeat 8

Add step counter overlay and limit frames:

png2mp4 --step 50 --max 100 --repeat 4

Parameters:

  • --repeat: Number of times to repeat each frame (default: 1)
  • --step: Step multiplier for the frame counter overlay
  • --max: Maximum number of frames to include
  • --steps-from-filename: Automatically calculate step size from filename patterns

Features:

  • Automatically processes multiple sample sets in the current directory
  • Creates high-quality MP4s with configurable bitrate (12Mbps)
  • Adds fade-out effect at the end
  • Supports step counter overlay
  • Creates temporary files in ~/.local/tmp
  • Output format: {current_directory_name}_sample{N}.mp4

Example filename patterns:

output_01_000000.png  # First frame, sample 01
output_01_000100.png  # Second frame, sample 01
output_02_000000.png  # First frame, sample 02

xyplot

Image comparison grid generator:

  • Supports multiple image formats
  • Customizable grid layouts
  • Optional row/column labels
  • Automatic image padding and alignment

concat_captions

Utility for combining multiple caption files:

  • Merges .caption and .tags files
  • Maintains original image associations
  • Batch processing support
  • Error handling for missing files

stats

Directory analysis and statistics generation tool that provides detailed file counts and metrics:

  • Detailed file counting by extension with color-coded output for different file types (JXL, PNG, JPG, etc.)
  • Multiple sorting options (by name, count, or specific file types)
  • Recursive directory scanning with aggregated statistics
  • Color-coded thresholds for dataset size evaluation
  • Automatic categorization of files into image and text groups
  • Grand total calculations across all subdirectories

shortcode

Hugo-compatible shortcode generator for image galleries with blurhash integration:

  • Generates Hugo-compatible shortcode blocks for each image
  • Integrates blurhash codes for progressive image loading
  • Automatically extracts and includes image dimensions
  • Preserves and integrates image captions from metadata
  • Supports grid layout configurations
  • Processes directories recursively while maintaining structure
  • Handles relative path resolution for static content

yiffdata

Comprehensive image metadata extraction and JSON generation utility:

  • Extracts precise image dimensions using PIL
  • Combines existing blurhash codes from .bh files
  • Integrates caption data from .caption files
  • Generates consolidated JSON output with all metadata
  • Maintains original filename references
  • Supports batch processing of entire directories
  • Preserves file relationships and metadata hierarchy

txt2tags

Batch file extension conversion utility for dataset management:

  • Converts .txt files to .tags format for ML training compatibility
  • Preserves original file content and structure
  • Supports recursive directory traversal
  • Interactive mode for selective conversion
  • Maintains original file timestamps and permissions
  • Simple command-line interface with directory input

txt2emoji

Advanced text-to-emoji conversion system with context awareness:

  • Sophisticated word-to-emoji mapping with custom dictionaries
  • Context-aware emoji selection to avoid redundancy
  • Detailed conversion explanations with rationale
  • Batch processing with multiple output formats
  • Configurable threshold and filtering options
  • NLTK integration for improved text parsing
  • Extensive customization options for emoji mappings

jtp2

State-of-the-art image classification system using Redrocket's PILOT2 model:

  • Implements Vision Transformer architecture with custom modifications
  • Features GatedHead classifier for improved accuracy
  • CUDA-accelerated inference with FP16 support
  • Configurable confidence thresholds for tag generation
  • Comprehensive batch processing capabilities
  • Automatic tag file generation alongside images
  • Supports multiple image formats including JXL

keyframe

Efficient video keyframe extraction tool using FFmpeg:

  • Extracts high-quality keyframes from video files
  • Creates organized output directories automatically
  • Maintains original frame quality and metadata
  • Intelligent I-frame detection and extraction
  • Sequential frame naming with padding
  • Minimal quality loss during extraction
  • Simple command-line interface

chop_blocks

Advanced LoRA model manipulation tool for fine-grained control using code from resize-lora by Gaeros:

  • Precise block-level filtering of LoRA models
  • Sophisticated weight adjustment capabilities
  • Full SafeTensors format support
  • Detailed analysis and reporting of model structure
  • Preserves model metadata during modifications
  • Vector string format for block manipulation
  • Supports both SDXL and SD1 naming conventions

πŸ”§ Core Utilities


File Processing (utils/file_processor.py)

Base framework for file processing operations:

  • Abstract base class for consistent file handling
  • Configurable processing options (recursive, dry-run, debug)
  • Built-in logging and error handling
  • Support for multiple file extensions
  • Hidden file filtering

Example usage:

from utils.file_processor import FileProcessor, ProcessorOptions

class MyProcessor(FileProcessor):
    def process_content(self, content: str) -> str:
        # Add your processing logic here
        return content.replace('old', 'new')

# Initialize with options
options = ProcessorOptions(
    recursive=True,
    dry_run=False,
    file_extensions={'.txt', '.md'}
)

# Process files
processor = MyProcessor(options)
processor.process_directory(Path('path/to/directory'))

Internationalization (utils/i18n_utils.py)

Centralized i18n functionality using Python's gettext:

  • System locale detection and setup
  • Translation file management
  • Fallback handling to English
  • Organized locale structure support
  • Simple integration with setup_i18n() function

Example usage:

from utils.i18n_utils import setup_i18n

# Initialize translations for your script
_ = setup_i18n('my_script')

# Use translations in your code
print(_("Processing files..."))
print(_("Found {} images").format(count))

Logging (utils/logging_utils.py)

Standardized logging setup across the toolkit:

  • Configurable log levels and directories
  • Console and file output support
  • Formatted logging messages
  • Debug mode toggle
  • Clean handler management

Example usage:

from utils.logging_utils import setup_logger
from pathlib import Path

# Setup logger with file output
logger = setup_logger(
    name="my_script",
    log_dir=Path("logs"),
    debug=True
)

# Use logger
logger.debug("Detailed debug info")
logger.info("Processing started")
logger.warning("Missing optional file")
logger.error("Failed to process file")

Image Processing Utilities (caption/imgproc_utils.py)

Common utilities for image processing tasks:

  • Colored logging output
  • File discovery and filtering
  • Batch processing support
  • Output path management
  • Processing validation
  • Multiple image format support

Example usage:

from caption.imgproc_utils import ProcessingOptions, find_images, batch_iterator
from pathlib import Path

# Setup options
opts = ProcessingOptions(
    recursive=True,
    batch_size=32,
    supported_extensions={'.png', '.jpg'}
)

# Find and process images
image_dir = Path('images')
for batch in batch_iterator(find_images(image_dir, opts), opts.batch_size):
    # Process batch of images
    for image_path in batch:
        print(f"Processing {image_path}")

Image Processing Base (caption/imgproc_base.py)

Abstract base class for image processors:

  • CUDA/CPU device management
  • Standard processing workflow
  • Result saving functionality
  • Error handling
  • PIL image support with JXL compatibility

Example usage:

from caption.imgproc_base import ImageProcessor
from caption.imgproc_utils import ProcessingOptions
from PIL import Image
from pathlib import Path

class MyImageProcessor(ImageProcessor):
    def load_models(self) -> None:
        # Load your ML models here
        self.model = load_my_model()
    
    def process_image(self, image: Image.Image, image_path: Path) -> str:
        # Process the image and return result
        return "processed image result"

# Initialize and use
processor = MyImageProcessor(ProcessingOptions())
processor.load_models()
processor.process_file(Path('image.jpg'), Path('output'))

Batch Processing (utils/batch_processor.py)

Generic batch processing framework:

  • Parallel processing support
  • Configurable batch sizes
  • Multi-worker processing
  • CUDA/CPU device management
  • Progress tracking
  • Type-safe generic implementation
  • Automatic worker count optimization

Example usage:

from utils.batch_processor import BatchProcessor, BatchOptions
from pathlib import Path
from typing import List

class MyBatchProcessor(BatchProcessor[Path, str]):
    def process_item(self, item: Path) -> str:
        # Process single item
        return f"Processed {item.name}"
    
    def should_process_item(self, item: Path) -> bool:
        return item.suffix in {'.png', '.jpg'}

# Initialize processor
opts = BatchOptions(
    batch_size=32,
    num_workers=4,
    device="cuda"
)

# Process files
processor = MyBatchProcessor(opts)
files = Path('data').glob('*')
results = list(processor.process_all(files, parallel=True))

πŸ“ Directory Structure

The utility modules are organized as follows:

~/toolkit/
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ file_processor.py
β”‚   β”œβ”€β”€ i18n_utils.py
β”‚   β”œβ”€β”€ logging_utils.py
β”‚   β”œβ”€β”€ batch_processor.py
β”‚   └── locales/
β”‚       └── [language_code]/
β”‚           └── LC_MESSAGES/
β”‚               └── [domain].mo
└── caption/
    β”œβ”€β”€ imgproc_utils.py
    └── imgproc_base.py

πŸš€ Installation


  1. Clone the repository: (optional)
git clone https://huggingface.co/k4d3/toolkit
  1. Add the repository to your PATH: (optional)
export PATH="$PATH:~/path/to/toolkit"
  1. Add the .zshrc to your shell: (optional and you will need to make changes to it)
source ~/path/to/toolkit/.zshrc
nano ~/.zshrc

πŸ“ Requirements


  • miniconda with the environment set up for training with sd-scripts, inferring with timm, llama, etc
  • ZSH shell (optional)
  • CUDA-capable GPU (recommended)
  • Required Python packages:
    • torch
    • transformers
    • pillow
    • pillow-jxl
    • opencv-python
    • numpy
    • and a lot more

πŸ”§ Usage


Each script can be used independently or as part of a workflow. Here are some usage examples:

XY Plot

The xyplot script creates image comparison grids with customizable layouts and labels. It's particularly useful for comparing model outputs, hyperparameter studies, or any image comparison tasks.

Basic usage:

xyplot image1.png image2.png image3.png --output comparison.jpg

Grid layout with row and column labels:

xyplot \
    output1.png output2.png output3.png \
    output4.png output5.png output6.png \
    --rows 2 \
    --column-labels "Model A" "Model B" "Model C" \
    --row-labels "Prompt 1" "Prompt 2" \
    --output grid_comparison.jpg

Compare different model outputs:

xyplot ./ComfyUI_00341_.png ./ComfyUI_00342_.png ./ComfyUI_00346_.png \
    --column-labels "No LoRA" "LoRA (weight: 1.0)" "LoRA (weight: 1.4)" \
    --rows 1 \
    --output lora_comparison.png

Multiple rows with seed variations:

xyplot \
    seed1_modelA.png seed1_modelB.png \
    seed2_modelA.png seed2_modelB.png \
    seed3_modelA.png seed3_modelB.png \
    --rows 3 \
    --column-labels "Model A" "Model B" \
    --row-labels "Seed 1" "Seed 2" "Seed 3" \
    --output seed_study.jpg

Parameters:

  • --rows: Number of rows in the grid (default: 1)
  • --labels: Optional labels for each image
  • --row-labels: Optional labels for each row
  • --column-labels: Optional labels for each column
  • --output: Output filename (default: output.jpg)

JoyCaption

joy --feed-from-tags=10 --custom_prompt="Write a very long descriptive caption for this image in a formal tone. Do not mention feelings and emotions evoked by the image." .

png2mp4

png2mp4 --repeat 16

inject_to_txt

inject_to_txt 1_honovy "honovy"

replace_comma_with_keep_tags_txt

replace_comma_with_keep_tags_txt 1 1_honovy

πŸ“¦ Directory Structure


~/
β”œβ”€β”€ datasets/
β”œβ”€β”€ output_dir/
β”œβ”€β”€ models/
β”œβ”€β”€ toolkit/

πŸ“„ License


WTFPL - Do what the fuck you want with it.

The included data and models are copyrighted by their respective owners with their own licenses.

🀝 Contributing


Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.

πŸ“š Documentation


If the documentation of a script is missing, ask a language model about it.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .