CycleGAN_Depth2RobotsV2_Blend Model

This model transforms depth maps into robot-style images, and also transforms robot-style images into estimated depth maps using CycleGAN architecture.

Model Description

This model was trained on robot images generated with SDXL, and their associated depth maps, taken with Depth-Anything-V2: Depth2RobotsV2_Annotations
using CycleGAN architecture
Training notebooks and dataset genertors can be found in the src folder, and can also be found in the github repo !(Ollama-Agent-Roll-Cage/controlnet-2-anything)[https://github.com/Ollama-Agent-Roll-Cage/controlnet-2-anything]
It supports bidirectional transformation:
- Depth map → Robot-style imagery
- Robot-style imagery → Depth map
The model uses a ResNet-based generator with residual blocks

Installation

# Clone the repository
git clone https://github.com/Ollama-Agent-Roll-Cage/controlnet-2-anything
cd cycleGAN_Depth2RobotsV2

# Install dependencies
pip install torch torchvision gradio pyvirtualcam

Usage Options

Option 1: Simple Test Interface

Run the simple test interface to quickly try out the model:

python cycleGANtest.py

This launches a Gradio interface where you can:

Upload an image
Select conversion direction (Depth to Image or Image to Depth)
Transform the image with a single click

Option 2: Webcam Integration with Depth Estimation

For a more advanced setup that includes real-time webcam processing with Depth Anything V2:

# Set the path to Depth Anything V2
export DEPTH_ANYTHING_V2_PATH=/path/to/depth-anything-v2

# Run the integrated application
python discordDepth2AnythingGAN.py

This launches a Gradio interface that allows you to:

Capture webcam input
Generate depth maps using Depth Anything V2
Apply winter-themed colormap to depth maps
Apply CycleGAN transformation in either direction
Output to a virtual camera for use in video conferencing or streaming

Using the Model Programmatically

import torch
import numpy as np
import torchvision.transforms as transforms
from PIL import Image
from huggingface_hub import hf_hub_download

# Define the Generator architecture (as shown in the provided code)
class ResidualBlock(nn.Module):
    def __init__(self, channels):
        super(ResidualBlock, self).__init__()
        self.conv_block = nn.Sequential(
            nn.ReflectionPad2d(1),
            nn.Conv2d(channels, channels, 3),
            nn.InstanceNorm2d(channels),
            nn.ReLU(inplace=True),
            nn.ReflectionPad2d(1),
            nn.Conv2d(channels, channels, 3),
            nn.InstanceNorm2d(channels)
        )

    def forward(self, x):
        return x + self.conv_block(x)

class Generator(nn.Module):
    def __init__(self, input_channels=3, output_channels=3, n_residual_blocks=9):
        super(Generator, self).__init__()
        
        # Initial convolution
        model = [
            nn.ReflectionPad2d(3),
            nn.Conv2d(input_channels, 64, 7),
            nn.InstanceNorm2d(64),
            nn.ReLU(inplace=True)
        ]
        
        # Downsampling
        in_features = 64
        out_features = in_features * 2
        for _ in range(2):
            model += [
                nn.Conv2d(in_features, out_features, 3, stride=2, padding=1),
                nn.InstanceNorm2d(out_features),
                nn.ReLU(inplace=True)
            ]
            in_features = out_features
            out_features = in_features * 2
        
        # Residual blocks
        for _ in range(n_residual_blocks):
            model += [ResidualBlock(in_features)]
        
        # Upsampling
        out_features = in_features // 2
        for _ in range(2):
            model += [
                nn.ConvTranspose2d(in_features, out_features, 3, stride=2, padding=1, output_padding=1),
                nn.InstanceNorm2d(out_features),
                nn.ReLU(inplace=True)
            ]
            in_features = out_features
            out_features = in_features // 2
        
        # Output layer
        model += [
            nn.ReflectionPad2d(3),
            nn.Conv2d(64, output_channels, 7),
            nn.Tanh()
        ]
        
        self.model = nn.Sequential(*model)
    
    def forward(self, x):
        return self.model(x)

# Download the model
def download_model(direction="depth2image"):
    if direction == "depth2image":
        filename = "latest_net_G_A.pth"
    else:  # "image2depth"
        filename = "latest_net_G_B.pth"
    
    model_path = hf_hub_download(
        repo_id="Borcherding/CycleGAN_Depth2RobotsV2_Blend", 
        filename=filename
    )
    return model_path

# Image preprocessing
def preprocess_image(image):
    """
    Preprocess image for model input
    
    Args:
        image: PIL Image or numpy array
    
    Returns:
        torch.Tensor: Normalized tensor ready for model input
    """
    if isinstance(image, np.ndarray):
        image = Image.fromarray(image.astype('uint8'), 'RGB')
    
    transform = transforms.Compose([
        transforms.Resize(256),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
    ])
    
    return transform(image).unsqueeze(0)

# Image postprocessing
def postprocess_image(tensor):
    """
    Convert model output tensor to numpy image
    
    Args:
        tensor: Model output tensor
    
    Returns:
        numpy.ndarray: RGB image array (0-255)
    """
    tensor = tensor.squeeze(0).cpu()
    tensor = (tensor + 1) / 2
    tensor = tensor.clamp(0, 1)
    tensor = tensor.permute(1, 2, 0).numpy()
    return (tensor * 255).astype(np.uint8)

# Example usage
def transform_image(input_image_path, direction="depth2image"):
    """
    Transform an image using the Depth2Robot model
    
    Args:
        input_image_path: Path to input image
        direction: "depth2image" or "image2depth"
    
    Returns:
        numpy.ndarray: Transformed image
    """
    # Load model
    model_path = download_model(direction)
    model = Generator()
    model.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False)
    model.eval()
    
    # Load and preprocess image
    input_image = Image.open(input_image_path).convert('RGB')
    input_tensor = preprocess_image(input_image)
    
    # Generate output
    with torch.no_grad():
        output_tensor = model(input_tensor)
    
    # Postprocess output
    output_image = postprocess_image(output_tensor)
    
    return output_image

Model Checkpoints

The model checkpoints are available on Hugging Face:

Repository: Borcherding/Depth2RobotsV2_Annotations
Files:
- latest_net_G_A.pth - Generator for Depth to Robot Image transformation
- latest_net_G_B.pth - Generator for Robot Image to Depth transformation

Integration with Depth Anything V2

The integrated application (discordDepth2AnythingGAN.py) also leverages Depth Anything V2 for real-time depth estimation, providing a complete pipeline:

Capture webcam input
Generate depth maps with Depth Anything V2
Apply CycleGAN transformation
Output to virtual camera

Requirements

Python 3.7+
PyTorch 1.7+
torchvision
gradio
pyvirtualcam (for webcam integration)
OpenCV (cv2)
Depth Anything V2 (for integrated application)

License

[Insert your license information here]

Acknowledgments

This model uses CycleGAN architecture from the paper Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks by Zhu et al.
The implementation is based on junyanz/pytorch-CycleGAN-and-pix2pix
Integrated application leverages Depth Anything V2 for depth estimation

Borcherding
/

CycleGAN_Depth2RobotsV2_Blend