AI Image Processing Toolkit
A collection of specialized scripts for AI image processing, dataset preparation, and model training workflows.
π οΈ Scripts Overview
wdv3
An image tagging script using the WD V3 tagger models by SmilingWolf based on this repo. Supports multiple model architectures (ViT, SwinV2, ConvNext) and can process both single images and directories recursively.
Features
- Multiple model architecture support
- Batch processing capabilities
- Adjustable confidence thresholds
- CUDA acceleration with FP16 support
- JXL image format support
train_functions
A set of ZSH functions for managing AI model training workflows:
- Script execution management
- Training variable setup
- Git repository state tracking
- Output directory management
- Automatic cleanup of empty outputs
git-wrapper
Enhanced Git functionality for dataset management:
- Automatic submodule handling
- LFS integration for JXL files
- Dataset-specific Git attributes management
check4sig
Dataset caption file watermark detection utility:
- Scans .caption files for watermark-related text
- Batch processing support
- Interactive editing with nvim
- Recursive directory scanning
gallery-dl
Directory-aware wrapper for gallery-dl:
- Automatically changes to ~/datasets directory
- Maintains consistent download locations
- Preserves original command functionality
joy
Advanced image captioning system by fancyfeast called JoyCaption using CLIP and LLM
- Multiple caption styles (descriptive, training prompts, art critic, etc.)
- Custom image adapters
- Tag-based caption generation
- Batch processing support
png2mp4
The png2mp4
script converts sequences of PNG images into MP4 videos, particularly useful for visualizing training progress or creating animation sequences. It supports multiple sample sets and various customization options.
Basic usage (repeat each frame 16 times):
png2mp4 --repeat 16
Automatically detect step size from filenames:
png2mp4 --steps-from-filename --repeat 8
Add step counter overlay and limit frames:
png2mp4 --step 50 --max 100 --repeat 4
Parameters:
--repeat
: Number of times to repeat each frame (default: 1)--step
: Step multiplier for the frame counter overlay--max
: Maximum number of frames to include--steps-from-filename
: Automatically calculate step size from filename patterns
Features:
- Automatically processes multiple sample sets in the current directory
- Creates high-quality MP4s with configurable bitrate (12Mbps)
- Adds fade-out effect at the end
- Supports step counter overlay
- Creates temporary files in
~/.local/tmp
- Output format:
{current_directory_name}_sample{N}.mp4
Example filename patterns:
output_01_000000.png # First frame, sample 01
output_01_000100.png # Second frame, sample 01
output_02_000000.png # First frame, sample 02
xyplot
Image comparison grid generator:
- Supports multiple image formats
- Customizable grid layouts
- Optional row/column labels
- Automatic image padding and alignment
concat_captions
Utility for combining multiple caption files:
- Merges .caption and .tags files
- Maintains original image associations
- Batch processing support
- Error handling for missing files
stats
Directory analysis and statistics generation tool that provides detailed file counts and metrics:
- Detailed file counting by extension with color-coded output for different file types (JXL, PNG, JPG, etc.)
- Multiple sorting options (by name, count, or specific file types)
- Recursive directory scanning with aggregated statistics
- Color-coded thresholds for dataset size evaluation
- Automatic categorization of files into image and text groups
- Grand total calculations across all subdirectories
shortcode
Hugo-compatible shortcode generator for image galleries with blurhash integration:
- Generates Hugo-compatible shortcode blocks for each image
- Integrates blurhash codes for progressive image loading
- Automatically extracts and includes image dimensions
- Preserves and integrates image captions from metadata
- Supports grid layout configurations
- Processes directories recursively while maintaining structure
- Handles relative path resolution for static content
yiffdata
Comprehensive image metadata extraction and JSON generation utility:
- Extracts precise image dimensions using PIL
- Combines existing blurhash codes from .bh files
- Integrates caption data from .caption files
- Generates consolidated JSON output with all metadata
- Maintains original filename references
- Supports batch processing of entire directories
- Preserves file relationships and metadata hierarchy
txt2tags
Batch file extension conversion utility for dataset management:
- Converts .txt files to .tags format for ML training compatibility
- Preserves original file content and structure
- Supports recursive directory traversal
- Interactive mode for selective conversion
- Maintains original file timestamps and permissions
- Simple command-line interface with directory input
txt2emoji
Advanced text-to-emoji conversion system with context awareness:
- Sophisticated word-to-emoji mapping with custom dictionaries
- Context-aware emoji selection to avoid redundancy
- Detailed conversion explanations with rationale
- Batch processing with multiple output formats
- Configurable threshold and filtering options
- NLTK integration for improved text parsing
- Extensive customization options for emoji mappings
jtp2
State-of-the-art image classification system using Redrocket's PILOT2 model:
- Implements Vision Transformer architecture with custom modifications
- Features GatedHead classifier for improved accuracy
- CUDA-accelerated inference with FP16 support
- Configurable confidence thresholds for tag generation
- Comprehensive batch processing capabilities
- Automatic tag file generation alongside images
- Supports multiple image formats including JXL
keyframe
Efficient video keyframe extraction tool using FFmpeg:
- Extracts high-quality keyframes from video files
- Creates organized output directories automatically
- Maintains original frame quality and metadata
- Intelligent I-frame detection and extraction
- Sequential frame naming with padding
- Minimal quality loss during extraction
- Simple command-line interface
chop_blocks
Advanced LoRA model manipulation tool for fine-grained control using code from resize-lora by Gaeros:
- Precise block-level filtering of LoRA models
- Sophisticated weight adjustment capabilities
- Full SafeTensors format support
- Detailed analysis and reporting of model structure
- Preserves model metadata during modifications
- Vector string format for block manipulation
- Supports both SDXL and SD1 naming conventions
π§ Core Utilities
File Processing (utils/file_processor.py
)
Base framework for file processing operations:
- Abstract base class for consistent file handling
- Configurable processing options (recursive, dry-run, debug)
- Built-in logging and error handling
- Support for multiple file extensions
- Hidden file filtering
Example usage:
from utils.file_processor import FileProcessor, ProcessorOptions
class MyProcessor(FileProcessor):
def process_content(self, content: str) -> str:
# Add your processing logic here
return content.replace('old', 'new')
# Initialize with options
options = ProcessorOptions(
recursive=True,
dry_run=False,
file_extensions={'.txt', '.md'}
)
# Process files
processor = MyProcessor(options)
processor.process_directory(Path('path/to/directory'))
Internationalization (utils/i18n_utils.py
)
Centralized i18n functionality using Python's gettext:
- System locale detection and setup
- Translation file management
- Fallback handling to English
- Organized locale structure support
- Simple integration with
setup_i18n()
function
Example usage:
from utils.i18n_utils import setup_i18n
# Initialize translations for your script
_ = setup_i18n('my_script')
# Use translations in your code
print(_("Processing files..."))
print(_("Found {} images").format(count))
Logging (utils/logging_utils.py
)
Standardized logging setup across the toolkit:
- Configurable log levels and directories
- Console and file output support
- Formatted logging messages
- Debug mode toggle
- Clean handler management
Example usage:
from utils.logging_utils import setup_logger
from pathlib import Path
# Setup logger with file output
logger = setup_logger(
name="my_script",
log_dir=Path("logs"),
debug=True
)
# Use logger
logger.debug("Detailed debug info")
logger.info("Processing started")
logger.warning("Missing optional file")
logger.error("Failed to process file")
Image Processing Utilities (caption/imgproc_utils.py
)
Common utilities for image processing tasks:
- Colored logging output
- File discovery and filtering
- Batch processing support
- Output path management
- Processing validation
- Multiple image format support
Example usage:
from caption.imgproc_utils import ProcessingOptions, find_images, batch_iterator
from pathlib import Path
# Setup options
opts = ProcessingOptions(
recursive=True,
batch_size=32,
supported_extensions={'.png', '.jpg'}
)
# Find and process images
image_dir = Path('images')
for batch in batch_iterator(find_images(image_dir, opts), opts.batch_size):
# Process batch of images
for image_path in batch:
print(f"Processing {image_path}")
Image Processing Base (caption/imgproc_base.py
)
Abstract base class for image processors:
- CUDA/CPU device management
- Standard processing workflow
- Result saving functionality
- Error handling
- PIL image support with JXL compatibility
Example usage:
from caption.imgproc_base import ImageProcessor
from caption.imgproc_utils import ProcessingOptions
from PIL import Image
from pathlib import Path
class MyImageProcessor(ImageProcessor):
def load_models(self) -> None:
# Load your ML models here
self.model = load_my_model()
def process_image(self, image: Image.Image, image_path: Path) -> str:
# Process the image and return result
return "processed image result"
# Initialize and use
processor = MyImageProcessor(ProcessingOptions())
processor.load_models()
processor.process_file(Path('image.jpg'), Path('output'))
Batch Processing (utils/batch_processor.py
)
Generic batch processing framework:
- Parallel processing support
- Configurable batch sizes
- Multi-worker processing
- CUDA/CPU device management
- Progress tracking
- Type-safe generic implementation
- Automatic worker count optimization
Example usage:
from utils.batch_processor import BatchProcessor, BatchOptions
from pathlib import Path
from typing import List
class MyBatchProcessor(BatchProcessor[Path, str]):
def process_item(self, item: Path) -> str:
# Process single item
return f"Processed {item.name}"
def should_process_item(self, item: Path) -> bool:
return item.suffix in {'.png', '.jpg'}
# Initialize processor
opts = BatchOptions(
batch_size=32,
num_workers=4,
device="cuda"
)
# Process files
processor = MyBatchProcessor(opts)
files = Path('data').glob('*')
results = list(processor.process_all(files, parallel=True))
π Directory Structure
The utility modules are organized as follows:
~/toolkit/
βββ utils/
β βββ file_processor.py
β βββ i18n_utils.py
β βββ logging_utils.py
β βββ batch_processor.py
β βββ locales/
β βββ [language_code]/
β βββ LC_MESSAGES/
β βββ [domain].mo
βββ caption/
βββ imgproc_utils.py
βββ imgproc_base.py
π Installation
- Clone the repository: (optional)
git clone https://huggingface.co/k4d3/toolkit
- Add the repository to your PATH: (optional)
export PATH="$PATH:~/path/to/toolkit"
- Add the
.zshrc
to your shell: (optional and you will need to make changes to it)
source ~/path/to/toolkit/.zshrc
nano ~/.zshrc
π Requirements
- miniconda with the environment set up for training with sd-scripts, inferring with timm, llama, etc
- ZSH shell (optional)
- CUDA-capable GPU (recommended)
- Required Python packages:
- torch
- transformers
- pillow
- pillow-jxl
- opencv-python
- numpy
- and a lot more
π§ Usage
Each script can be used independently or as part of a workflow. Here are some usage examples:
XY Plot
The xyplot
script creates image comparison grids with customizable layouts and labels. It's particularly useful for comparing model outputs, hyperparameter studies, or any image comparison tasks.
Basic usage:
xyplot image1.png image2.png image3.png --output comparison.jpg
Grid layout with row and column labels:
xyplot \
output1.png output2.png output3.png \
output4.png output5.png output6.png \
--rows 2 \
--column-labels "Model A" "Model B" "Model C" \
--row-labels "Prompt 1" "Prompt 2" \
--output grid_comparison.jpg
Compare different model outputs:
xyplot ./ComfyUI_00341_.png ./ComfyUI_00342_.png ./ComfyUI_00346_.png \
--column-labels "No LoRA" "LoRA (weight: 1.0)" "LoRA (weight: 1.4)" \
--rows 1 \
--output lora_comparison.png
Multiple rows with seed variations:
xyplot \
seed1_modelA.png seed1_modelB.png \
seed2_modelA.png seed2_modelB.png \
seed3_modelA.png seed3_modelB.png \
--rows 3 \
--column-labels "Model A" "Model B" \
--row-labels "Seed 1" "Seed 2" "Seed 3" \
--output seed_study.jpg
Parameters:
--rows
: Number of rows in the grid (default: 1)--labels
: Optional labels for each image--row-labels
: Optional labels for each row--column-labels
: Optional labels for each column--output
: Output filename (default: output.jpg)
JoyCaption
joy --feed-from-tags=10 --custom_prompt="Write a very long descriptive caption for this image in a formal tone. Do not mention feelings and emotions evoked by the image." .
png2mp4
png2mp4 --repeat 16
inject_to_txt
inject_to_txt 1_honovy "honovy"
replace_comma_with_keep_tags_txt
replace_comma_with_keep_tags_txt 1 1_honovy
π¦ Directory Structure
~/
βββ datasets/
βββ output_dir/
βββ models/
βββ toolkit/
π License
WTFPL - Do what the fuck you want with it.
The included data and models are copyrighted by their respective owners with their own licenses.
π€ Contributing
Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.
π Documentation
If the documentation of a script is missing, ask a language model about it.