Z-Image-Turbo-Fun-Controlnet-Union

Model Features

This ControlNet is added on 6 blocks.
The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
It supports multiple control conditions—including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.

TODO

Train on more data and for more steps.
Support inpaint mode.

Results

Pose	Output

Pose	Output

Canny	Output

HED	Output

Depth	Output

Inference

Go to the VideoX-Fun repository for more details.

Please clone the VideoX-Fun repository and create the required directories:

# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git

# Enter VideoX-Fun's directory
cd VideoX-Fun

# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model

Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.

📦 models/
├── 📂 Diffusion_Transformer/
│   └── 📂 Z-Image-Turbo/
├── 📂 Personalized_Model/
│   └── 📦 Z-Image-Turbo-Fun-Controlnet-Union.safetensors

Then run the file examples/z_image_fun/predict_t2i_control.py.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support