Z-Image-Turbo-Fun-Controlnet-Union
Model Features
- This ControlNet is added on 6 blocks.
- The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
- It supports multiple control conditionsβincluding Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
- You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.
TODO
- Train on more data and for more steps.
- Support inpaint mode.
Results
| Pose | Output |
![]() |
![]() |
| Pose | Output |
![]() |
![]() |
| Canny | Output |
![]() |
![]() |
| HED | Output |
![]() |
![]() |
| Depth | Output |
![]() |
![]() |
Inference
Go to the VideoX-Fun repository for more details.
Please clone the VideoX-Fun repository and create the required directories:
# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git
# Enter VideoX-Fun's directory
cd VideoX-Fun
# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model
Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
π¦ models/
βββ π Diffusion_Transformer/
β βββ π Z-Image-Turbo/
βββ π Personalized_Model/
β βββ π¦ Z-Image-Turbo-Fun-Controlnet-Union.safetensors
Then run the file examples/z_image_fun/predict_t2i_control.py.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support









