Dancing Image-to-Video Generation
This repository contains the necessary steps and scripts to generate videos using the Dancing image-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts.
Prerequisites
Before proceeding, ensure that you have the following installed on your system:
• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg
Installation
Update and Install Dependencies
sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg
Clone the Repository
git clone https://huggingface.co/svjack/Dancing_wan_2_1_14_B_image2video_lora cd Dancing_wan_2_1_14_B_image2video_lora
Install Python Dependencies
pip install torch torchvision pip install -r requirements.txt pip install ascii-magic matplotlib tensorboard huggingface_hub datasets pip install moviepy==1.0.3 pip install sageattention==1.0.6
Download Model Weights
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_i2v_480p_14B_bf16.safetensors
Usage
To generate a video, use the wan_generate_video.py
script with the appropriate parameters. Below are examples of how to generate videos using the Dancing model.
1. "Animated Character Enjoying a Hamburger"
- Source Image
- epoch 2 (more like i2v)
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000002.safetensors \
--lora_multiplier 1.0 \
--image_path "red_girl.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is in the midst of enjoying a hamburger, with hands moving gracefully to take bites, sometimes gently holding the burger, other times elegantly wiping crumbs, as if savoring the flavors or following the rhythm of enjoyment. The entire scene is filled with fluidity and charm, captivating the audience with its authenticity and expressiveness"
- epoch 6 (more like dancing)
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000006.safetensors \
--lora_multiplier 1.0 \
--image_path "red_girl.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is in the midst of enjoying a hamburger, with hands moving gracefully to take bites, sometimes gently holding the burger, other times elegantly wiping crumbs, as if savoring the flavors or following the rhythm of enjoyment. The entire scene is filled with fluidity and charm, captivating the audience with its authenticity and expressiveness"
Genshin Impact
2. "Animated Furina de Fontaine Waving"
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000002.safetensors \
--lora_multiplier 1.0 \
--image_path "fufu_clear.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is waving hello, with hands moving gracefully"
because train on white background
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000005.safetensors \
--lora_multiplier 1.0 \
--image_path "fufu_clear.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is waving hello, with hands moving gracefully"
3. "Animated Card Furina de Fontaine"
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000005.safetensors \
--lora_multiplier 1.0 \
--image_path "fufu_card.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is waving hello, with hands moving gracefully"
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000005.safetensors \
--lora_multiplier 1.0 \
--image_path "fufu_card.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is dancing, with hands moving gracefully"
Star Tail
3. "KFC"
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000005.safetensors \
--lora_multiplier 1.3 \
--image_path "三月七.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is drinking juice and dancing."
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000005.safetensors \
--lora_multiplier 1.0 \
--image_path "丹恒.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is eating hamburger and dancing."
Zenless Zone Zero
5. "Rope Craftsman"
- Source Image
python wan_generate_video.py --fp8 --video_size 832 480 --video_length 45 --infer_steps 20 \
--save_path save --output_type both \
--task i2v-14B --t5 models_t5_umt5-xxl-enc-bf16.pth --clip models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth \
--dit wan2.1_i2v_480p_14B_fp8_e4m3fn.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight dancing_white_outputs/dancing_white_i2v_w14_lora-000005.safetensors \
--lora_multiplier 1.0 \
--image_path "绳匠.png" \
--prompt "In the style of Yi Chen Dancing White Background, the video features an animated character. The character is waving hello, with hands moving gracefully"
Parameters
--fp8
: Enable FP8 precision (optional).--task
: Specify the task (e.g.,t2v-1.3B
).--video_size
: Set the resolution of the generated video (e.g.,1024 1024
).--video_length
: Define the length of the video in frames.--infer_steps
: Number of inference steps.--save_path
: Directory to save the generated video.--output_type
: Output type (e.g.,both
for video and frames).--dit
: Path to the diffusion model weights.--vae
: Path to the VAE model weights.--t5
: Path to the T5 model weights.--attn_mode
: Attention mode (e.g.,torch
).--lora_weight
: Path to the LoRA weights.--lora_multiplier
: Multiplier for LoRA weights.--prompt
: Textual prompt for video generation.
Output
The generated video and frames will be saved in the specified save_path
directory.
Troubleshooting
• Ensure all dependencies are correctly installed.
• Verify that the model weights are downloaded and placed in the correct locations.
• Check for any missing Python packages and install them using pip
.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
• Hugging Face for hosting the model weights. • Wan-AI for providing the pre-trained models. • DeepBeepMeep for contributing to the model weights.
Contact
For any questions or issues, please open an issue on the repository or contact the maintainer.