Citlali Text-to-Video Generation

This repository contains the necessary steps and scripts to generate videos using the Citlali text-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts.

Prerequisites

Before proceeding, ensure that you have the following installed on your system:

• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg

Installation

Update and Install Dependencies

sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg

Clone the Repository

git clone https://huggingface.co/svjack/Citlali_wan_2_1_14_B_text2video_lora
cd Citlali_wan_2_1_14_B_text2video_lora

Install Python Dependencies

pip install torch torchvision
pip install -r requirements.txt
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
pip install moviepy==1.0.3
pip install sageattention==1.0.6

Download Model Weights

wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors

Usage

To generate a video, use the wan_generate_video.py script with the appropriate parameters. Below are examples of how to generate videos using the Citlali model.

14B usage

python wan_generate_video.py --fp8 --task t2v-14B --video_size 480 832 --infer_steps 35 --video_length 25 \
--save_path save --output_type video \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight genshin_Citlali_w14_outputs/genshin_Citlali_w14_im_lora-step00007500.safetensors \
--lora_multiplier 1.0 \
--interactive

"Melodic Solitude"

In the style of Citlali, this is a digital anime-style illustration of a young woman with long, pastel purple hair and blue eyes, wearing a dark blue, sleeveless, high-collared outfit with gold and purple accents. She has large, circular, teal headphones on her head. She sits cross-legged on a sunlit park bench, fingers tapping rhythmically on her knees as she loses herself in the music, her hair gently swaying in the breeze.

"Starlit Serenade"

In the style of Citlali, this is a digital anime-style illustration of a young woman with long, pastel purple hair and blue eyes, wearing a dark blue, sleeveless, high-collared outfit with gold and purple accents. She has large, circular, teal headphones on her head. She lies on a grassy hill under a starry sky, arms behind her head, eyes closed as the music transports her to another world, her hair fanned out around her like a lavender halo.

Rainy Reverie

In the style of Citlali, this is a digital anime-style illustration of a young woman with long, pastel purple hair and blue eyes, wearing a dark blue, sleeveless, high-collared outfit with gold and purple accents. She has large, circular, teal headphones on her head. She stands by a rain-streaked window, fingertips pressed lightly against the glass, lost in thought as the pitter-patter of raindrops syncs with her playlist.

Café Harmony

In the style of Citlali, this is a digital anime-style illustration of a young woman with long, pastel purple hair and blue eyes, wearing a dark blue, sleeveless, high-collared outfit with gold and purple accents. She has large, circular, teal headphones on her head. She leans over a small café table, sketching in a notebook while sipping a latte, her headphones slightly tilted as she occasionally nods to the music.

Urban Groove

In the style of Citlali, this is a digital anime-style illustration of a young woman with long, pastel purple hair and blue eyes, wearing a dark blue, sleeveless, high-collared outfit with gold and purple accents. She has large, circular, teal headphones on her head. She walks down a neon-lit city street at night, her boots clicking against the pavement while she hums along to the beat, her headphones glowing softly in the dark.

use wan 14b t2v
use wan fusionX 14b

Parameters

--fp8: Enable FP8 precision (optional).
--task: Specify the task (e.g., t2v-1.3B).
--video_size: Set the resolution of the generated video (e.g., 1024 1024).
--video_length: Define the length of the video in frames.
--infer_steps: Number of inference steps.
--save_path: Directory to save the generated video.
--output_type: Output type (e.g., both for video and frames).
--dit: Path to the diffusion model weights.
--vae: Path to the VAE model weights.
--t5: Path to the T5 model weights.
--attn_mode: Attention mode (e.g., torch).
--lora_weight: Path to the LoRA weights.
--lora_multiplier: Multiplier for LoRA weights.
--prompt: Textual prompt for video generation.

Output

The generated video and frames will be saved in the specified save_path directory.

Troubleshooting

• Ensure all dependencies are correctly installed. • Verify that the model weights are downloaded and placed in the correct locations. • Check for any missing Python packages and install them using pip.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

• Hugging Face for hosting the model weights. • Wan-AI for providing the pre-trained models. • DeepBeepMeep for contributing to the model weights.

Contact

For any questions or issues, please open an issue on the repository or contact the maintainer.