Z-Image-Turbo-Fun-Controlnet-Union

Github

Model Features

  • This ControlNet is added on 6 blocks.
  • The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
  • It supports multiple control conditionsβ€”including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
  • You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.

TODO

  • Train on more data and for more steps.
  • Support inpaint mode.

Results

Pose Output
Pose Output
Canny Output
HED Output
Depth Output

Inference

Go to the VideoX-Fun repository for more details.

Please clone the VideoX-Fun repository and create the required directories:

# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git

# Enter VideoX-Fun's directory
cd VideoX-Fun

# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model

Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.

πŸ“¦ models/
β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
β”‚   └── πŸ“‚ Z-Image-Turbo/
β”œβ”€β”€ πŸ“‚ Personalized_Model/
β”‚   └── πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union.safetensors

Then run the file examples/z_image_fun/predict_t2i_control.py.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support