Spaces:
Running
on
T4
Running
on
T4
merged doc
Browse files- README.md +58 -405
- docs/CHANGELOG.md +150 -0
- docs/CLI.md +226 -0
- docs/GETTING_STARTED.md +194 -0
- docs/INSTALLATION.md +170 -0
- docs/LORAS.md +197 -0
- docs/MODELS.md +268 -0
- docs/TROUBLESHOOTING.md +338 -0
- docs/VACE.md +190 -0
- hyvideo/diffusion/pipelines/pipeline_hunyuan_video_audio.py +1 -1
- wgp.py +40 -12
README.md
CHANGED
@@ -15,442 +15,95 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
|
|
15 |
- Loras Support to customize each model
|
16 |
- Queuing system : make your shopping list of videos to generate and come back later
|
17 |
|
18 |
-
|
19 |
**Discord Server to get Help from Other Users and show your Best Videos:** https://discord.gg/g7efUW9jGV
|
20 |
|
|
|
|
|
|
|
|
|
21 |
|
|
|
|
|
|
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
- Click *Export Settings to File* to save on your harddrive the current settings. You will be able to use them later again by clicking *Drop File Here* and select this time a Settings json file
|
29 |
-
* May 23 2025: 👋 WanGP v5.21 : Improvements for Vace: better transitions between Sliding Windows,Support for Image masks in Matanyone, new Extend Video for Vace, different types of automated background removal
|
30 |
-
* May 20 2025: 👋 WanGP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
|
31 |
-
The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
|
32 |
-
See instructions below on how to use CausVid.\
|
33 |
-
Also as an experiment I have added support for the MoviiGen, the first model that claims to capable to generate 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server.
|
34 |
-
* May 18 2025: 👋 WanGP v5.1 : Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos !
|
35 |
-
* May 17 2025: 👋 WanGP v5.0 : One App to Rule Them All !\
|
36 |
-
Added support for the other great open source architectures:
|
37 |
-
- Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
|
38 |
-
- LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced LTX Video VRAM requirements by 4 !
|
39 |
-
|
40 |
-
Also:
|
41 |
-
- Added supported for the best Control Video Model, released 2 days ago : Vace 14B
|
42 |
-
- New Integrated prompt enhancer to increase the quality of the generated videos
|
43 |
-
You will need one more *pip install -r requirements.txt*
|
44 |
-
|
45 |
-
* May 5 2025: 👋 WanGP v4.5: FantasySpeaking model, you can animate a talking head using a voice track. This works not only on people but also on objects. Also better seamless transitions between Vace sliding windows for very long videos (see recommended settings). New high quality processing features (mixed 16/32 bits calculation and 32 bitsVAE)
|
46 |
-
* April 27 2025: 👋 WanGP v4.4: Phantom model support, very good model to transfer people or objects into video, works quite well at 720p and with the number of steps > 30
|
47 |
-
* April 25 2025: 👋 WanGP v4.3: Added preview mode and support for Sky Reels v2 Diffusion Forcing for high quality "infinite length videos" (see Window Sliding section below).Note that Skyreel uses causal attention that is only supported by Sdpa attention so even if chose an other type of attention, some of the processes will use Sdpa attention.
|
48 |
-
|
49 |
-
* April 18 2025: 👋 WanGP v4.2: FLF2V model support, official support from Wan for image2video start and end frames specialized for 720p.
|
50 |
-
* April 17 2025: 👋 WanGP v4.1: Recam Master model support, view a video from a different angle. The video to process must be at least 81 frames long and you should set at least 15 steps denoising to get good results.
|
51 |
-
* April 13 2025: 👋 WanGP v4.0: lots of goodies for you !
|
52 |
-
- A new UI, tabs were replaced by a Dropdown box to easily switch models
|
53 |
-
- A new queuing system that lets you stack in a queue as many text2video, imag2video tasks, ... as you want. Each task can rely on complete different generation parameters (different number of frames, steps, loras, ...). Many thanks to **Tophness** for being a big contributor on this new feature
|
54 |
-
- Temporal upsampling (Rife) and spatial upsampling (Lanczos) for a smoother video (32 fps or 64 fps) and to enlarge your video by x2 or x4. Check these new advanced options.
|
55 |
-
- Wan Vace Control Net support : with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ... I have provided an introduction guide below.
|
56 |
-
- Integrated *Matanyone* tool directly inside WanGP so that you can create easily inpainting masks used in Vace
|
57 |
-
- Sliding Window generation for Vace, create windows that can last dozen of seconds
|
58 |
-
- New optimisations for old generation GPUs: Generate 5s (81 frames, 15 steps) of Vace 1.3B with only 5GB and in only 6 minutes on a RTX 2080Ti and 5s of t2v 14B in less than 10 minutes.
|
59 |
-
|
60 |
-
* Mar 27 2025: 👋 Added support for the new Wan Fun InP models (image2video). The 14B Fun InP has probably better end image support but unfortunately existing loras do not work so well with it. The great novelty is the Fun InP image2 1.3B model : Image 2 Video is now accessible to even lower hardware configuration. It is not as good as the 14B models but very impressive for its size. You can choose any of those models in the Configuration tab. Many thanks to the VideoX-Fun team (https://github.com/aigc-apps/VideoX-Fun)
|
61 |
-
* Mar 26 2025: 👋 Good news ! Official support for RTX 50xx please check the installation instructions below.
|
62 |
-
* Mar 24 2025: 👋 Wan2.1GP v3.2:
|
63 |
-
- Added Classifier-Free Guidance Zero Star. The video should match better the text prompt (especially with text2video) at no performance cost: many thanks to the **CFG Zero * Team:**\
|
64 |
-
Dont hesitate to give them a star if you appreciate the results: https://github.com/WeichenFan/CFG-Zero-star
|
65 |
-
- Added back support for Pytorch compilation with Loras. It seems it had been broken for some time
|
66 |
-
- Added possibility to keep a number of pregenerated videos in the Video Gallery (useful to compare outputs of different settings)
|
67 |
-
You will need one more *pip install -r requirements.txt*
|
68 |
-
* Mar 19 2025: 👋 Wan2.1GP v3.1: Faster launch and RAM optimizations (should require less RAM to run)\
|
69 |
-
You will need one more *pip install -r requirements.txt*
|
70 |
-
* Mar 18 2025: 👋 Wan2.1GP v3.0:
|
71 |
-
- New Tab based interface, yon can switch from i2v to t2v conversely without restarting the app
|
72 |
-
- Experimental Dual Frames mode for i2v, you can also specify an End frame. It doesn't always work, so you will need a few attempts.
|
73 |
-
- You can save default settings in the files *i2v_settings.json* and *t2v_settings.json* that will be used when launching the app (you can also specify the path to different settings files)
|
74 |
-
- Slight acceleration with loras\
|
75 |
-
You will need one more *pip install -r requirements.txt*
|
76 |
-
Many thanks to *Tophness* who created the framework (and did a big part of the work) of the multitabs and saved settings features
|
77 |
-
* Mar 18 2025: 👋 Wan2.1GP v2.11: Added more command line parameters to prefill the generation settings + customizable output directory and choice of type of metadata for generated videos. Many thanks to *Tophness* for his contributions. You will need one more *pip install -r requirements.txt* to reflect new dependencies\
|
78 |
-
* Mar 18 2025: 👋 Wan2.1GP v2.1: More Loras !: added support for 'Safetensors' and 'Replicate' Lora formats.\
|
79 |
-
You will need to refresh the requirements with a *pip install -r requirements.txt*
|
80 |
-
* Mar 17 2025: 👋 Wan2.1GP v2.0: The Lora festival continues:
|
81 |
-
- Clearer user interface
|
82 |
-
- Download 30 Loras in one click to try them all (expand the info section)
|
83 |
-
- Very to use Loras as now Lora presets can input the subject (or other need terms) of the Lora so that you dont have to modify manually a prompt
|
84 |
-
- Added basic macro prompt language to prefill prompts with differnent values. With one prompt template, you can generate multiple prompts.
|
85 |
-
- New Multiple images prompts: you can now combine any number of images with any number of text promtps (need to launch the app with --multiple-images)
|
86 |
-
- New command lines options to launch directly the 1.3B t2v model or the 14B t2v model
|
87 |
-
* Mar 14, 2025: 👋 Wan2.1GP v1.7:
|
88 |
-
- Lora Fest special edition: very fast loading / unload of loras for those Loras collectors around. You can also now add / remove loras in the Lora folder without restarting the app. You will need to refresh the requirements *pip install -r requirements.txt*
|
89 |
-
- Added experimental Skip Layer Guidance (advanced settings), that should improve the image quality at no extra cost. Many thanks to the *AmericanPresidentJimmyCarter* for the original implementation
|
90 |
-
* Mar 13, 2025: 👋 Wan2.1GP v1.6: Better Loras support, accelerated loading Loras. You will need to refresh the requirements *pip install -r requirements.txt*
|
91 |
-
* Mar 10, 2025: 👋 Wan2.1GP v1.5: Official Teacache support + Smart Teacache (find automatically best parameters for a requested speed multiplier), 10% speed boost with no quality loss, improved lora presets (they can now include prompts and comments to guide the user)
|
92 |
-
* Mar 07, 2025: 👋 Wan2.1GP v1.4: Fix Pytorch compilation, now it is really 20% faster when activated
|
93 |
-
* Mar 04, 2025: 👋 Wan2.1GP v1.3: Support for Image to Video with multiples images for different images / prompts combinations (requires *--multiple-images* switch), and added command line *--preload x* to preload in VRAM x MB of the main diffusion model if you find there is too much unused VRAM and you want to (slightly) accelerate the generation process.
|
94 |
-
If you upgrade you will need to do a 'pip install -r requirements.txt' again.
|
95 |
-
* Mar 04, 2025: 👋 Wan2.1GP v1.2: Implemented tiling on VAE encoding and decoding. No more VRAM peaks at the beginning and at the end
|
96 |
-
* Mar 03, 2025: 👋 Wan2.1GP v1.1: added Tea Cache support for faster generations: optimization of kijai's implementation (https://github.com/kijai/ComfyUI-WanVideoWrapper/) of teacache (https://github.com/ali-vilab/TeaCache)
|
97 |
-
* Mar 02, 2025: 👋 Wan2.1GP by DeepBeepMeep v1 brings:
|
98 |
-
- Support for all Wan including the Image to Video model
|
99 |
-
- Reduced memory consumption by 2, with possiblity to generate more than 10s of video at 720p with a RTX 4090 and 10s of video at 480p with less than 12GB of VRAM. Many thanks to REFLEx (https://github.com/thu-ml/RIFLEx) for their algorithm that allows generating nice looking video longer than 5s.
|
100 |
-
- The usual perks: web interface, multiple generations, loras support, sage attebtion, auto download of models, ...
|
101 |
-
|
102 |
-
* Feb 25, 2025: 👋 We've released the inference code and weights of Wan2.1.
|
103 |
-
* Feb 27, 2025: 👋 Wan2.1 has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
## Installation Guide for Linux and Windows for GPUs up to RTX40xx
|
108 |
-
|
109 |
-
**If you are looking for a one click installation, just go to the Pinokio App store : https://pinokio.computer/**\
|
110 |
-
Otherwise you will find the instructions below:
|
111 |
-
|
112 |
-
This app has been tested on Python 3.10 / 2.6.0 / Cuda 12.4.
|
113 |
-
|
114 |
-
```shell
|
115 |
-
# 0 Download the source and create a Python 3.10.9 environment using conda or create a venv using python
|
116 |
-
git clone https://github.com/deepbeepmeep/Wan2GP.git
|
117 |
-
cd Wan2GP
|
118 |
-
conda create -n wan2gp python=3.10.9
|
119 |
-
conda activate wan2gp
|
120 |
|
121 |
-
|
122 |
-
|
123 |
|
124 |
-
|
125 |
-
|
126 |
|
127 |
-
|
128 |
-
|
129 |
-
pip install triton-windows
|
130 |
-
# For both Windows and Linux
|
131 |
-
pip install sageattention==1.0.6
|
132 |
|
|
|
133 |
|
134 |
-
|
135 |
-
# Windows only
|
136 |
-
pip install triton-windows
|
137 |
-
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl
|
138 |
-
# Linux only (sorry only manual compilation for the moment, but is straight forward with Linux)
|
139 |
-
git clone https://github.com/thu-ml/SageAttention
|
140 |
-
cd SageAttention
|
141 |
-
pip install -e .
|
142 |
|
143 |
-
|
144 |
-
|
|
|
|
|
|
|
145 |
|
146 |
-
|
147 |
|
148 |
-
|
149 |
-
In order to install Sage, you will need to install also Triton. If Triton is installed you can turn on *Pytorch Compilation* which will give you an additional 20% speed boost and reduced VRAM consumption.
|
150 |
|
151 |
-
|
152 |
-
|
153 |
-
It is important to use Python 3.10 otherwise the pip wheels may not be compatible.
|
154 |
-
```
|
155 |
-
# 0 Download the source and create a Python 3.10.9 environment using conda or create a venv using python
|
156 |
git clone https://github.com/deepbeepmeep/Wan2GP.git
|
157 |
cd Wan2GP
|
158 |
conda create -n wan2gp python=3.10.9
|
159 |
conda activate wan2gp
|
160 |
-
|
161 |
-
# 1 Install pytorch 2.7.0:
|
162 |
-
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
|
163 |
-
|
164 |
-
# 2. Install pip dependencies
|
165 |
pip install -r requirements.txt
|
166 |
-
|
167 |
-
# 3.1 optional Sage attention support (30% faster)
|
168 |
-
# Windows only: extra step only needed for windows as triton is included in pytorch with the Linux version of pytorch
|
169 |
-
pip install triton-windows
|
170 |
-
# For both Windows and Linux
|
171 |
-
pip install sageattention==1.0.6
|
172 |
-
|
173 |
-
|
174 |
-
# 3.2 optional Sage 2 attention support (40% faster)
|
175 |
-
# Windows only
|
176 |
-
pip install triton-windows
|
177 |
-
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl
|
178 |
-
|
179 |
-
# Linux only (sorry only manual compilation for the moment, but is straight forward with Linux)
|
180 |
-
git clone https://github.com/thu-ml/SageAttention
|
181 |
-
cd SageAttention
|
182 |
-
pip install -e .
|
183 |
-
```
|
184 |
-
|
185 |
-
## Run the application
|
186 |
-
|
187 |
-
### Run a Gradio Server on port 7860 (recommended)
|
188 |
-
|
189 |
-
To run the text to video generator (in Low VRAM mode):
|
190 |
-
```bash
|
191 |
-
python wgp.py
|
192 |
-
#or
|
193 |
-
python wgp.py --t2v #launch the default Wan text 2 video model
|
194 |
-
#or
|
195 |
-
python wgp.py --t2v-14B #for the Wan 14B model
|
196 |
-
#or
|
197 |
-
python wgp.py --t2v-1-3B #for the Wan 1.3B model
|
198 |
-
|
199 |
-
```
|
200 |
-
|
201 |
-
To run the image to video generator (in Low VRAM mode):
|
202 |
-
```bash
|
203 |
-
python wgp.py --i2v
|
204 |
-
```
|
205 |
-
To run the 1.3B Fun InP image to video generator (in Low VRAM mode):
|
206 |
-
```bash
|
207 |
-
python wgp.py --i2v-1-3B
|
208 |
-
```
|
209 |
-
|
210 |
-
To be able to input multiple images with the image to video generator:
|
211 |
-
```bash
|
212 |
-
python wgp.py --i2v --multiple-images
|
213 |
-
```
|
214 |
-
|
215 |
-
Within the application you can configure which video generator will be launched without specifying a command line switch.
|
216 |
-
|
217 |
-
To run the application while loading entirely the diffusion model in VRAM (slightly faster but requires 24 GB of VRAM for a 8 bits quantized 14B model )
|
218 |
-
```bash
|
219 |
-
python wgp.py --profile 3
|
220 |
```
|
221 |
|
222 |
-
**
|
223 |
-
If you have installed Sage attention, it may seem that it works because *pip install sageattention* didn't produce and error or because sage is offered as on option but in fact it doesnt work : in order to be fully operatioal Sage needs to compile its triton kernels the first time it is run (that is the first time you try to generate a video).
|
224 |
-
|
225 |
-
Sometime fixing Sage compilation is easy (clear the triton cache, check triton is properly installed) sometime it is simply not possible because Sage is not supported on some older GPUs
|
226 |
-
|
227 |
-
Therefore you may have no choice but to fallback to sdpa attention, to do so:
|
228 |
-
- In the configuration menu inside the application, switch "Attention mode" to "sdpa"
|
229 |
-
or
|
230 |
-
- Launch the application this way:
|
231 |
```bash
|
232 |
-
python wgp.py
|
|
|
233 |
```
|
234 |
|
235 |
-
|
236 |
-
|
237 |
-
|
238 |
-
Lora for the Wan models are stored in the subfoler 'loras' for t2v and 'loras_i2v'. You will be then able to activate / desactive any of them when running the application by selecting them in the Advanced Tab "Loras" .
|
239 |
-
|
240 |
-
If you want to manage in different areas Loras for the 1.3B model and the 14B of Wan t2v models (as they are not compatible), just create the following subfolders:
|
241 |
-
- loras/1.3B
|
242 |
-
- loras/14B
|
243 |
|
244 |
-
|
245 |
-
|
246 |
-
python wgp.exe --lora-dir path --lora-dir-i2v path
|
247 |
-
```
|
248 |
|
249 |
-
|
250 |
-
-loras_hunyuan
|
251 |
-
-loras_hunyuan_i2v
|
252 |
-
-loras_ltxv
|
253 |
|
|
|
|
|
|
|
254 |
|
255 |
-
|
256 |
-
|
|
|
|
|
257 |
|
258 |
-
|
259 |
|
260 |
-
|
|
|
261 |
|
262 |
-
|
263 |
-
0.9,0.8,0.7
|
264 |
-
1.2,1.1,1.0
|
265 |
-
```
|
266 |
-
You can edit, save or delete Loras presets (combinations of loras with their corresponding multipliers) directly from the gradio Web interface. These presets will save the *comment* part of the prompt that should contain some instructions how to use the corresponding the loras (for instance by specifying a trigger word or providing an example).A comment in the prompt is a line that starts that a #. It will be ignored by the video generator. For instance:
|
267 |
-
|
268 |
-
```
|
269 |
-
# use they keyword ohnvx to trigger the Lora*
|
270 |
-
A ohnvx is driving a car
|
271 |
-
```
|
272 |
-
Each preset, is a file with ".lset" extension stored in the loras directory and can be shared with other users
|
273 |
|
274 |
-
Last but not least you can pre activate Loras corresponding and prefill a prompt (comments only or full prompt) by specifying a preset when launching the gradio server:
|
275 |
-
```bash
|
276 |
-
python wgp.py --lora-preset mylorapreset.lset # where 'mylorapreset.lset' is a preset stored in the 'loras' folder
|
277 |
-
```
|
278 |
-
|
279 |
-
You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
|
280 |
-
|
281 |
-
### CausVid Lora
|
282 |
-
|
283 |
-
Wan CausVid is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. Also as a distilled model it doesnt require CFG and is two times faster for the same number of steps.
|
284 |
-
The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B to accelerate other models too. It is possible it works also with Wan i2v models.
|
285 |
-
|
286 |
-
Instructions:
|
287 |
-
1) Download first the Lora: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
|
288 |
-
2) Choose a Wan t2v model (for instance Wan 2.1 text2video 13B or Vace 13B )
|
289 |
-
3) Turn on the Advanced Mode by checking the corresponding checkbox
|
290 |
-
4) In the Advanced Generation Tab: select Guidance Scale =1, Shift Scale = 7
|
291 |
-
5) In the Advanced Lora Tab : Select the CausVid Lora (click the Refresh button at the top if you dont see it), and enter 0.3 as Lora multiplier
|
292 |
-
6) Now select a 12 steps generation and Click Generate
|
293 |
-
|
294 |
-
You can reduce the number of steps to as low as 4 but you will need to increase progressively at the same time the Lora muliplier up to 1. Please note the lower the number of steps the lower the quality (especially the motion).
|
295 |
-
|
296 |
-
You can combine the CausVid Lora and other Loras (just follow the instructions above)
|
297 |
-
|
298 |
-
### Macros (basic)
|
299 |
-
In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
|
300 |
-
```
|
301 |
-
! {Subject}="cat","woman","man", {Location}="forest","lake","city", {Possessive}="its", "her", "his"
|
302 |
-
In the video, a {Subject} is presented. The {Subject} is in a {Location} and looks at {Possessive} watch.
|
303 |
-
```
|
304 |
-
|
305 |
-
This will create automatically 3 prompts that will cause the generation of 3 videos:
|
306 |
-
```
|
307 |
-
In the video, a cat is presented. The cat is in a forest and looks at its watch.
|
308 |
-
In the video, a man is presented. The man is in a lake and looks at his watch.
|
309 |
-
In the video, a woman is presented. The woman is in a city and looks at her watch.
|
310 |
-
```
|
311 |
-
|
312 |
-
You can define multiple lines of macros. If there is only one macro line, the app will generate a simple user interface to enter the macro variables when getting back to *Normal Mode* (advanced mode turned off)
|
313 |
-
|
314 |
-
### VACE ControlNet introduction
|
315 |
-
|
316 |
-
Vace is a ControlNet that allows you to do Video to Video and Reference to Video (inject your own images into the output video). It is probably one of the most powerful Wan models and you will be able to do amazing things when you master it: inject in the scene people or objects of your choice, animate a person, perform inpainting or outpainting, continue a video, ...
|
317 |
-
|
318 |
-
First you need to select the Vace 1.3B model or the Vace 13B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 7s with the Riflex option turned on.
|
319 |
-
|
320 |
-
Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
|
321 |
-
- *a Control Video*\
|
322 |
-
Based on your choice, you can decide to transfer the motion, the depth in a new Video. You can tell WanGP to use only the first n frames of Control Video and to extrapolate the rest. You can also do inpainting. If the video contains area of grey color 127, they will be considered as masks and will be filled based on the Text prompt of the reference Images.
|
323 |
-
|
324 |
-
- *Reference Images*\
|
325 |
-
A reference Image can be as well a background that you want to use as a setting for the video or people or objects of your choice that you want to inject in the video. You can select multiple reference Images. The integration of object / person image is more efficient if the background is replaced by the full white color. For complex background removal you can use the Image version of the Matanyone tool that is embedded with WanGP or use you can use the fast on the fly background remover by selecting an option in the drop down box *Remove background*. Becareful not to remove the background of the reference image that is a landscape or setting (always the first reference image) that you want to use as a start image / background for the video. It helps greatly to reference and describe explictly the injected objects / people of the Reference Images in the text prompt.
|
326 |
-
|
327 |
-
- *a Video Mask*\
|
328 |
-
This offers a stronger mechanism to tell Vace which parts should be kept (black) or replaced (white). You can do as well inpainting / outpainting, fill the missing part of a video more efficientlty with just the video hint. For instance, if a video mask is white except at the beginning and at the end where it is black, the first and last frames will be kept and everything in between will be generated.
|
329 |
-
|
330 |
-
|
331 |
-
|
332 |
-
Examples:
|
333 |
-
- Inject people and / objects into a scene describe by a text prompt: Ref. Images + text Prompt
|
334 |
-
- Animate a character described in a text prompt: a Video of person moving + text Prompt
|
335 |
-
- Animate a character of your choice (motion transfer) : Ref Images + a Video of person moving + text Prompt
|
336 |
-
- Change the style of a scene (depth transfer): a Video that contains objects / person at differen depths + text Prompt
|
337 |
-
|
338 |
-
|
339 |
-
There are lots of possible combinations. Some of them require to prepare some materials (masks on top of video, full masks, etc...).
|
340 |
-
|
341 |
-
Vace provides on its github (https://github.com/ali-vilab/VACE/tree/main/vace/gradios) annotators / preprocessors Gradio tool that can help you build some of these materials depending on the task you want to achieve.
|
342 |
-
|
343 |
-
There is also a guide that describes the various combination of hints (https://github.com/ali-vilab/VACE/blob/main/UserGuide.md).Good luck !
|
344 |
-
|
345 |
-
It seems you will get better results with Vace if you turn on "Skip Layer Guidance" with its default configuration.
|
346 |
-
|
347 |
-
Other recommended setttings for Vace:
|
348 |
-
- Use a long prompt description especially for the people / objects that are in the background and not in reference images. This will ensure consistency between the windows.
|
349 |
-
- Set a medium size overlap window: long enough to give the model a sense of the motion but short enough so any overlapped blurred frames do no turn the rest of the video into a blurred video
|
350 |
-
- Truncate at least the last 4 frames of the each generated window as Vace last frames tends to be blurry
|
351 |
-
|
352 |
-
**WanGP integrates the Matanyone tool which is tuned to work with Vace**.
|
353 |
-
|
354 |
-
This can be very useful to create at the same time a control video and a mask video that go together.\
|
355 |
-
For example, if you want to replace a face of a person in a video:
|
356 |
-
- load the video in the Matanyone tool
|
357 |
-
- click the face on the first frame and create a mask for it (if you have some trouble to select only the face look at the tips below)
|
358 |
-
- generate both the control video and the mask video by clicking *Generate Video Matting*
|
359 |
-
- Click *Export to current Video Input and Video Mask*
|
360 |
-
- In the *Reference Image* field of the Vace screen, load a picture of the replacement face
|
361 |
-
|
362 |
-
Please notes that sometime it may be useful to create *Background Masks* if want for instance to replace everything but a character that is in the video. You can do that by selecting *Background Mask* in the *Matanyone settings*
|
363 |
-
|
364 |
-
If you have some trouble creating the perfect mask, be aware of these tips:
|
365 |
-
- Using the Matanyone Settings you can also define Negative Point Prompts to remove parts of the current selection.
|
366 |
-
- Sometime it is very hard to fit everything you want in a single mask, it may be much easier to combine multiple independent sub Masks before producing the Matting : each sub Mask is created by selecting an area of an image and by clicking the Add Mask button. Sub masks can then be enabled / disabled in the Matanyone settings.
|
367 |
-
|
368 |
-
|
369 |
-
### VACE, Sky Reels v2 Diffusion Forcing Slidig Window and LTX Video
|
370 |
-
With this mode (that works for the moment only with Vace, Sky Reels v2 and LTX Video) you can merge mutiple Videos to form a very long video (up to 1 min).
|
371 |
-
|
372 |
-
When combined with Vace this feature can use the same control video to generate the full Video that results from concatenining the different windows. For instance the first 0-4s of the control video will be used to generate the first window then the next 4-8s of the control video will be used to generate the second window, and so on. So if your control video contains a person walking, your generate video could contain up to one minute of this person walking.
|
373 |
-
|
374 |
-
When combined with Sky Reels V2, you can extend an existing video indefinetely.
|
375 |
-
|
376 |
-
Sliding Windows are turned on by default and are triggered as soon as you try to generate a Video longer than the Window Size. You can go in the Advanced Settings Tab *Sliding Window* to set this Window Size. You can make the Video even longer during the generation process by adding one more Window to generate each time you click "Extend the Video Sample, Please !" button.
|
377 |
-
|
378 |
-
Although the window duration is set by the *Sliding Window Size* form field, the actual number of frames generated by each iteration will be less, because of the *overlap frames* and *discard last frames*:
|
379 |
-
- *overlap frames* : the first frames of a new window are filled with last frames of the previous window in order to ensure continuity between the two windows
|
380 |
-
- *discard last frames* : sometime (Vace 1.3B model Only) the last frames of a window have a worse quality. You can decide here how many ending frames of a new window should be dropped.
|
381 |
-
|
382 |
-
There is some inevitable quality degradation over time to due to accumulated errors in calculation. One trick to reduce it / hide it is to add some noise (usually not noticable) on the overlapped frames using the *add overlapped noise* option.
|
383 |
-
|
384 |
-
|
385 |
-
Number of Generated Frames = [Number of Windows - 1] * ([Window Size] - [Overlap Frames] - [Discard Last Frames]) + [Window Size]
|
386 |
-
|
387 |
-
Experimental: if your prompt is broken into multiple lines (each line separated by a carriage return), then each line of the prompt will be used for a new window. If there are more windows to generate than prompt lines, the last prompt line will be repeated.
|
388 |
-
|
389 |
-
|
390 |
-
### Command line parameters for Gradio Server
|
391 |
-
--i2v : launch the image to video generator\
|
392 |
-
--t2v : launch the text to video generator (default defined in the configuration)\
|
393 |
-
--t2v-14B : launch the 14B model text to video generator\
|
394 |
-
--t2v-1-3B : launch the 1.3B model text to video generator\
|
395 |
-
--i2v-14B : launch the 14B model image to video generator\
|
396 |
-
--i2v-1-3B : launch the Fun InP 1.3B model image to video generator\
|
397 |
-
--vace : launch the Vace ControlNet 1.3B model image to video generator\
|
398 |
-
--quantize-transformer bool: (default True) : enable / disable on the fly transformer quantization\
|
399 |
-
--lora-dir path : Path of directory that contains Wan t2v Loras\
|
400 |
-
--lora-dir-i2v path : Path of directory that contains Wan i2v Loras\
|
401 |
-
--lora-dir-hunyuan path : Path of directory that contains Hunyuan t2v Loras\
|
402 |
-
--lora-dir-hunyuan-i2v path : Path of directory that contains Hunyuan i2v Loras\
|
403 |
-
--lora-dir-ltxv path : Path of directory that contains LTX Video Loras\
|
404 |
-
--lora-preset preset : name of preset gile (without the extension) to preload
|
405 |
-
--verbose level : default (1) : level of information between 0 and 2\
|
406 |
-
--server-port portno : default (7860) : Gradio port no\
|
407 |
-
--server-name name : default (localhost) : Gradio server name\
|
408 |
-
--open-browser : open automatically Browser when launching Gradio Server\
|
409 |
-
--lock-config : prevent modifying the video engine configuration from the interface\
|
410 |
-
--share : create a shareable URL on huggingface so that your server can be accessed remotely\
|
411 |
-
--multiple-images : allow the users to choose multiple images as different starting points for new videos\
|
412 |
-
--compile : turn on pytorch compilation\
|
413 |
-
--attention mode: force attention mode among, sdpa, flash, sage, sage2\
|
414 |
-
--profile no : default (4) : no of profile between 1 and 5\
|
415 |
-
--preload no : number in Megabytes to preload partially the diffusion model in VRAM , may offer speed gains on older hardware, on recent hardware (RTX 30XX, RTX40XX and RTX50XX) speed gain is only 10% and not worth it. Works only with profile 2 and 4.\
|
416 |
-
--seed no : set default seed value\
|
417 |
-
--frames no : set the default number of frames to generate\
|
418 |
-
--steps no : set the default number of denoising steps\
|
419 |
-
--teacache speed multiplier: Tea cache speed multiplier, choices=["0", "1.5", "1.75", "2.0", "2.25", "2.5"]\
|
420 |
-
--slg : turn on skip layer guidance for improved quality\
|
421 |
-
--check-loras : filter loras that are incompatible (will take a few seconds while refreshing the lora list or while starting the app)\
|
422 |
-
--advanced : turn on the advanced mode while launching the app\
|
423 |
-
--listen : make server accessible on network\
|
424 |
-
--gpu device : run Wan on device for instance "cuda:1"\
|
425 |
-
--settings: path a folder that contains the default settings for all the models\
|
426 |
-
--fp16: force to use fp16 versions of models instead of bf16 versions\
|
427 |
-
--perc-reserved-mem-max float_less_than_1 : max percentage of RAM to allocate to reserved RAM, allow faster transfers RAM<->VRAM. Value should remain below 0.5 to keep the OS stable\
|
428 |
-
--theme theme_name: load the UI with the specified Theme Name, so far only two are supported, "default" and "gradio". You may submit your own nice looking Gradio theme and I will add them
|
429 |
-
|
430 |
-
### Profiles (for power users only)
|
431 |
-
You can choose between 5 profiles, but two are really relevant here :
|
432 |
-
- LowRAM_HighVRAM (3): loads entirely the model in VRAM if possible, slightly faster, but less VRAM available for the video data after that
|
433 |
-
- LowRAM_LowVRAM (4): loads only the part of the model that is needed, low VRAM and low RAM requirement but slightly slower
|
434 |
-
|
435 |
-
You can adjust the number of megabytes to preload a model, with --preload nnn (nnn is the number of megabytes to preload)
|
436 |
### Other Models for the GPU Poor
|
|
|
|
|
|
|
|
|
|
|
|
|
437 |
|
438 |
-
|
439 |
-
One of the best open source Text to Video generator
|
440 |
-
|
441 |
-
- Hunyuan3D-2GP: https://github.com/deepbeepmeep/Hunyuan3D-2GP :\
|
442 |
-
A great image to 3D and text to 3D tool by the Tencent team. Thanks to mmgp it can run with less than 6 GB of VRAM
|
443 |
-
|
444 |
-
- FluxFillGP: https://github.com/deepbeepmeep/FluxFillGP :\
|
445 |
-
One of the best inpainting / outpainting tools based on Flux that can run with less than 12 GB of VRAM.
|
446 |
-
|
447 |
-
- Cosmos1GP: https://github.com/deepbeepmeep/Cosmos1GP :\
|
448 |
-
This application include two models: a text to world generator and a image / video to world (probably the best open source image to video generator).
|
449 |
-
|
450 |
-
- OminiControlGP: https://github.com/deepbeepmeep/OminiControlGP :\
|
451 |
-
A Flux derived application very powerful that can be used to transfer an object of your choice in a prompted scene. With mmgp you can run it with only 6 GB of VRAM.
|
452 |
-
|
453 |
-
- YuE GP: https://github.com/deepbeepmeep/YuEGP :\
|
454 |
-
A great song generator (instruments + singer's voice) based on prompted Lyrics and a genre description. Thanks to mmgp you can run it with less than 10 GB of VRAM without waiting forever.
|
455 |
-
|
456 |
|
|
|
|
|
|
|
|
15 |
- Loras Support to customize each model
|
16 |
- Queuing system : make your shopping list of videos to generate and come back later
|
17 |
|
|
|
18 |
**Discord Server to get Help from Other Users and show your Best Videos:** https://discord.gg/g7efUW9jGV
|
19 |
|
20 |
+
## 🔥 Latest Updates
|
21 |
+
### May 28 2025: WanGP v5.4
|
22 |
+
👋 World Exclusive : Hunyuan Video Avatar Support ! You won't need 80 GB of VRAM nor 32 GB oF VRAM, just 10 GB of VRAM will be sufficient to generate up to 15s of high quality speech / song driven Video at a high speed with no quality degradation. Support for TeaCache included.\
|
23 |
+
Also many thanks to Reevoy24 for his repackaging / completing the documentation
|
24 |
|
25 |
+
### May 28 2025: WanGP v5.31
|
26 |
+
👋 Added Phantom 14B, a model that you can use to transfer objects / people in the video. My preference goes to Vace that remains the king of controlnets.
|
27 |
+
VACE improvements: Better sliding window transitions, image mask support in Matanyone, new Extend Video feature, and enhanced background removal options.
|
28 |
|
29 |
+
### May 26, 2025: WanGP v5.3
|
30 |
+
👋 Settings management revolution! Now you can:
|
31 |
+
- Select any generated video and click *Use Selected Video Settings* to instantly reuse its configuration
|
32 |
+
- Drag & drop videos to automatically extract their settings metadata
|
33 |
+
- Export/import settings as JSON files for easy sharing and backup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
+
### May 20, 2025: WanGP v5.2
|
36 |
+
👋 **CausVid support** - Generate videos in just 4-12 steps with the new distilled Wan model! Also added experimental MoviiGen for 1080p generation (20GB+ VRAM required).
|
37 |
|
38 |
+
### May 18, 2025: WanGP v5.1
|
39 |
+
👋 **LTX Video 13B Distilled** - Generate high-quality videos in less than one minute!
|
40 |
|
41 |
+
### May 17, 2025: WanGP v5.0
|
42 |
+
👋 **One App to Rule Them All!** Added Hunyuan Video and LTX Video support, plus Vace 14B and integrated prompt enhancer.
|
|
|
|
|
|
|
43 |
|
44 |
+
See full changelog: **[Changelog](docs/CHANGELOG.md)**
|
45 |
|
46 |
+
## 📋 Table of Contents
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
+
- [🚀 Quick Start](#-quick-start)
|
49 |
+
- [📦 Installation](#-installation)
|
50 |
+
- [🎯 Usage](#-usage)
|
51 |
+
- [📚 Documentation](#-documentation)
|
52 |
+
- [🔗 Related Projects](#-related-projects)
|
53 |
|
54 |
+
## 🚀 Quick Start
|
55 |
|
56 |
+
**One-click installation:** Get started instantly with [Pinokio App](https://pinokio.computer/)
|
|
|
57 |
|
58 |
+
**Manual installation:**
|
59 |
+
```bash
|
|
|
|
|
|
|
60 |
git clone https://github.com/deepbeepmeep/Wan2GP.git
|
61 |
cd Wan2GP
|
62 |
conda create -n wan2gp python=3.10.9
|
63 |
conda activate wan2gp
|
64 |
+
pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
|
|
|
|
|
|
|
|
|
65 |
pip install -r requirements.txt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
```
|
67 |
|
68 |
+
**Run the application:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
```bash
|
70 |
+
python wgp.py # Text-to-video (default)
|
71 |
+
python wgp.py --i2v # Image-to-video
|
72 |
```
|
73 |
|
74 |
+
## 📦 Installation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
+
For detailed installation instructions for different GPU generations:
|
77 |
+
- **[Installation Guide](docs/INSTALLATION.md)** - Complete setup instructions for RTX 10XX to RTX 50XX
|
|
|
|
|
78 |
|
79 |
+
## 🎯 Usage
|
|
|
|
|
|
|
80 |
|
81 |
+
### Basic Usage
|
82 |
+
- **[Getting Started Guide](docs/GETTING_STARTED.md)** - First steps and basic usage
|
83 |
+
- **[Models Overview](docs/MODELS.md)** - Available models and their capabilities
|
84 |
|
85 |
+
### Advanced Features
|
86 |
+
- **[Loras Guide](docs/LORAS.md)** - Using and managing Loras for customization
|
87 |
+
- **[VACE ControlNet](docs/VACE.md)** - Advanced video control and manipulation
|
88 |
+
- **[Command Line Reference](docs/CLI.md)** - All available command line options
|
89 |
|
90 |
+
## 📚 Documentation
|
91 |
|
92 |
+
- **[Changelog](docs/CHANGELOG.md)** - Latest updates and version history
|
93 |
+
- **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions
|
94 |
|
95 |
+
## 🔗 Related Projects
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
### Other Models for the GPU Poor
|
98 |
+
- **[HuanyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP)** - One of the best open source Text to Video generators
|
99 |
+
- **[Hunyuan3D-2GP](https://github.com/deepbeepmeep/Hunyuan3D-2GP)** - Image to 3D and text to 3D tool
|
100 |
+
- **[FluxFillGP](https://github.com/deepbeepmeep/FluxFillGP)** - Inpainting/outpainting tools based on Flux
|
101 |
+
- **[Cosmos1GP](https://github.com/deepbeepmeep/Cosmos1GP)** - Text to world generator and image/video to world
|
102 |
+
- **[OminiControlGP](https://github.com/deepbeepmeep/OminiControlGP)** - Flux-derived application for object transfer
|
103 |
+
- **[YuE GP](https://github.com/deepbeepmeep/YuEGP)** - Song generator with instruments and singer's voice
|
104 |
|
105 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
+
<p align="center">
|
108 |
+
Made with ❤️ by DeepBeepMeep
|
109 |
+
</p>
|
docs/CHANGELOG.md
ADDED
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Changelog
|
2 |
+
|
3 |
+
## 🔥 Latest News
|
4 |
+
### May 28 2025: WanGP v5.4
|
5 |
+
👋 World Exclusive : Hunyuan Video Avatar Support ! You won't need 80 GB of VRAM nor 32 GB oF VRAM, just 10 GB of VRAM will be sufficient to generate up to 15s of high quality speech / song driven Video at a high speed with no quality degradation. Support for TeaCache included.
|
6 |
+
|
7 |
+
### May 26, 2025: WanGP v5.3
|
8 |
+
👋 Happy with a Video generation and want to do more generations using the same settings but you can't remember what you did or you find it too hard to copy/paste one per one each setting from the file metadata? Rejoice! There are now multiple ways to turn this tedious process into a one click task:
|
9 |
+
- Select one Video recently generated in the Video Gallery and click *Use Selected Video Settings*
|
10 |
+
- Click *Drop File Here* and select a Video you saved somewhere, if the settings metadata have been saved with the Video you will be able to extract them automatically
|
11 |
+
- Click *Export Settings to File* to save on your harddrive the current settings. You will be able to use them later again by clicking *Drop File Here* and select this time a Settings json file
|
12 |
+
|
13 |
+
### May 23, 2025: WanGP v5.21
|
14 |
+
👋 Improvements for Vace: better transitions between Sliding Windows, Support for Image masks in Matanyone, new Extend Video for Vace, different types of automated background removal
|
15 |
+
|
16 |
+
### May 20, 2025: WanGP v5.2
|
17 |
+
👋 Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. The great thing is that Kijai (Kudos to him!) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B. See [LORAS.md](LORAS.md) for instructions on how to use CausVid.
|
18 |
+
|
19 |
+
Also as an experiment I have added support for the MoviiGen, the first model that claims to be capable of generating 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server.
|
20 |
+
|
21 |
+
### May 18, 2025: WanGP v5.1
|
22 |
+
👋 Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos!
|
23 |
+
|
24 |
+
### May 17, 2025: WanGP v5.0
|
25 |
+
👋 One App to Rule Them All! Added support for the other great open source architectures:
|
26 |
+
- **Hunyuan Video**: text 2 video (one of the best, if not the best t2v), image 2 video and the recently released Hunyuan Custom (very good identity preservation when injecting a person into a video)
|
27 |
+
- **LTX Video 13B** (released last week): very long video support and fast 720p generation. Wan GP version has been greatly optimized and reduced LTX Video VRAM requirements by 4!
|
28 |
+
|
29 |
+
Also:
|
30 |
+
- Added support for the best Control Video Model, released 2 days ago: Vace 14B
|
31 |
+
- New Integrated prompt enhancer to increase the quality of the generated videos
|
32 |
+
|
33 |
+
*You will need one more `pip install -r requirements.txt`*
|
34 |
+
|
35 |
+
### May 5, 2025: WanGP v4.5
|
36 |
+
👋 FantasySpeaking model, you can animate a talking head using a voice track. This works not only on people but also on objects. Also better seamless transitions between Vace sliding windows for very long videos. New high quality processing features (mixed 16/32 bits calculation and 32 bits VAE)
|
37 |
+
|
38 |
+
### April 27, 2025: WanGP v4.4
|
39 |
+
👋 Phantom model support, very good model to transfer people or objects into video, works quite well at 720p and with the number of steps > 30
|
40 |
+
|
41 |
+
### April 25, 2025: WanGP v4.3
|
42 |
+
👋 Added preview mode and support for Sky Reels v2 Diffusion Forcing for high quality "infinite length videos". Note that Skyreel uses causal attention that is only supported by Sdpa attention so even if you choose another type of attention, some of the processes will use Sdpa attention.
|
43 |
+
|
44 |
+
### April 18, 2025: WanGP v4.2
|
45 |
+
👋 FLF2V model support, official support from Wan for image2video start and end frames specialized for 720p.
|
46 |
+
|
47 |
+
### April 17, 2025: WanGP v4.1
|
48 |
+
👋 Recam Master model support, view a video from a different angle. The video to process must be at least 81 frames long and you should set at least 15 steps denoising to get good results.
|
49 |
+
|
50 |
+
### April 13, 2025: WanGP v4.0
|
51 |
+
👋 Lots of goodies for you!
|
52 |
+
- A new UI, tabs were replaced by a Dropdown box to easily switch models
|
53 |
+
- A new queuing system that lets you stack in a queue as many text2video, image2video tasks, ... as you want. Each task can rely on complete different generation parameters (different number of frames, steps, loras, ...). Many thanks to **Tophness** for being a big contributor on this new feature
|
54 |
+
- Temporal upsampling (Rife) and spatial upsampling (Lanczos) for a smoother video (32 fps or 64 fps) and to enlarge your video by x2 or x4. Check these new advanced options.
|
55 |
+
- Wan Vace Control Net support: with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ... See [VACE.md](VACE.md) for introduction guide.
|
56 |
+
- Integrated *Matanyone* tool directly inside WanGP so that you can create easily inpainting masks used in Vace
|
57 |
+
- Sliding Window generation for Vace, create windows that can last dozens of seconds
|
58 |
+
- New optimizations for old generation GPUs: Generate 5s (81 frames, 15 steps) of Vace 1.3B with only 5GB and in only 6 minutes on a RTX 2080Ti and 5s of t2v 14B in less than 10 minutes.
|
59 |
+
|
60 |
+
### March 27, 2025
|
61 |
+
👋 Added support for the new Wan Fun InP models (image2video). The 14B Fun InP has probably better end image support but unfortunately existing loras do not work so well with it. The great novelty is the Fun InP image2 1.3B model: Image 2 Video is now accessible to even lower hardware configuration. It is not as good as the 14B models but very impressive for its size. Many thanks to the VideoX-Fun team (https://github.com/aigc-apps/VideoX-Fun)
|
62 |
+
|
63 |
+
### March 26, 2025
|
64 |
+
👋 Good news! Official support for RTX 50xx please check the [installation instructions](INSTALLATION.md).
|
65 |
+
|
66 |
+
### March 24, 2025: Wan2.1GP v3.2
|
67 |
+
👋
|
68 |
+
- Added Classifier-Free Guidance Zero Star. The video should match better the text prompt (especially with text2video) at no performance cost: many thanks to the **CFG Zero * Team**. Don't hesitate to give them a star if you appreciate the results: https://github.com/WeichenFan/CFG-Zero-star
|
69 |
+
- Added back support for PyTorch compilation with Loras. It seems it had been broken for some time
|
70 |
+
- Added possibility to keep a number of pregenerated videos in the Video Gallery (useful to compare outputs of different settings)
|
71 |
+
|
72 |
+
*You will need one more `pip install -r requirements.txt`*
|
73 |
+
|
74 |
+
### March 19, 2025: Wan2.1GP v3.1
|
75 |
+
👋 Faster launch and RAM optimizations (should require less RAM to run)
|
76 |
+
|
77 |
+
*You will need one more `pip install -r requirements.txt`*
|
78 |
+
|
79 |
+
### March 18, 2025: Wan2.1GP v3.0
|
80 |
+
👋
|
81 |
+
- New Tab based interface, you can switch from i2v to t2v conversely without restarting the app
|
82 |
+
- Experimental Dual Frames mode for i2v, you can also specify an End frame. It doesn't always work, so you will need a few attempts.
|
83 |
+
- You can save default settings in the files *i2v_settings.json* and *t2v_settings.json* that will be used when launching the app (you can also specify the path to different settings files)
|
84 |
+
- Slight acceleration with loras
|
85 |
+
|
86 |
+
*You will need one more `pip install -r requirements.txt`*
|
87 |
+
|
88 |
+
Many thanks to *Tophness* who created the framework (and did a big part of the work) of the multitabs and saved settings features
|
89 |
+
|
90 |
+
### March 18, 2025: Wan2.1GP v2.11
|
91 |
+
👋 Added more command line parameters to prefill the generation settings + customizable output directory and choice of type of metadata for generated videos. Many thanks to *Tophness* for his contributions.
|
92 |
+
|
93 |
+
*You will need one more `pip install -r requirements.txt` to reflect new dependencies*
|
94 |
+
|
95 |
+
### March 18, 2025: Wan2.1GP v2.1
|
96 |
+
👋 More Loras!: added support for 'Safetensors' and 'Replicate' Lora formats.
|
97 |
+
|
98 |
+
*You will need to refresh the requirements with a `pip install -r requirements.txt`*
|
99 |
+
|
100 |
+
### March 17, 2025: Wan2.1GP v2.0
|
101 |
+
👋 The Lora festival continues:
|
102 |
+
- Clearer user interface
|
103 |
+
- Download 30 Loras in one click to try them all (expand the info section)
|
104 |
+
- Very easy to use Loras as now Lora presets can input the subject (or other needed terms) of the Lora so that you don't have to modify manually a prompt
|
105 |
+
- Added basic macro prompt language to prefill prompts with different values. With one prompt template, you can generate multiple prompts.
|
106 |
+
- New Multiple images prompts: you can now combine any number of images with any number of text prompts (need to launch the app with --multiple-images)
|
107 |
+
- New command lines options to launch directly the 1.3B t2v model or the 14B t2v model
|
108 |
+
|
109 |
+
### March 14, 2025: Wan2.1GP v1.7
|
110 |
+
👋
|
111 |
+
- Lora Fest special edition: very fast loading/unload of loras for those Loras collectors around. You can also now add/remove loras in the Lora folder without restarting the app.
|
112 |
+
- Added experimental Skip Layer Guidance (advanced settings), that should improve the image quality at no extra cost. Many thanks to the *AmericanPresidentJimmyCarter* for the original implementation
|
113 |
+
|
114 |
+
*You will need to refresh the requirements `pip install -r requirements.txt`*
|
115 |
+
|
116 |
+
### March 13, 2025: Wan2.1GP v1.6
|
117 |
+
👋 Better Loras support, accelerated loading Loras.
|
118 |
+
|
119 |
+
*You will need to refresh the requirements `pip install -r requirements.txt`*
|
120 |
+
|
121 |
+
### March 10, 2025: Wan2.1GP v1.5
|
122 |
+
👋 Official Teacache support + Smart Teacache (find automatically best parameters for a requested speed multiplier), 10% speed boost with no quality loss, improved lora presets (they can now include prompts and comments to guide the user)
|
123 |
+
|
124 |
+
### March 7, 2025: Wan2.1GP v1.4
|
125 |
+
👋 Fix PyTorch compilation, now it is really 20% faster when activated
|
126 |
+
|
127 |
+
### March 4, 2025: Wan2.1GP v1.3
|
128 |
+
👋 Support for Image to Video with multiples images for different images/prompts combinations (requires *--multiple-images* switch), and added command line *--preload x* to preload in VRAM x MB of the main diffusion model if you find there is too much unused VRAM and you want to (slightly) accelerate the generation process.
|
129 |
+
|
130 |
+
*If you upgrade you will need to do a `pip install -r requirements.txt` again.*
|
131 |
+
|
132 |
+
### March 4, 2025: Wan2.1GP v1.2
|
133 |
+
👋 Implemented tiling on VAE encoding and decoding. No more VRAM peaks at the beginning and at the end
|
134 |
+
|
135 |
+
### March 3, 2025: Wan2.1GP v1.1
|
136 |
+
👋 Added Tea Cache support for faster generations: optimization of kijai's implementation (https://github.com/kijai/ComfyUI-WanVideoWrapper/) of teacache (https://github.com/ali-vilab/TeaCache)
|
137 |
+
|
138 |
+
### March 2, 2025: Wan2.1GP by DeepBeepMeep v1
|
139 |
+
👋 Brings:
|
140 |
+
- Support for all Wan including the Image to Video model
|
141 |
+
- Reduced memory consumption by 2, with possibility to generate more than 10s of video at 720p with a RTX 4090 and 10s of video at 480p with less than 12GB of VRAM. Many thanks to REFLEx (https://github.com/thu-ml/RIFLEx) for their algorithm that allows generating nice looking video longer than 5s.
|
142 |
+
- The usual perks: web interface, multiple generations, loras support, sage attention, auto download of models, ...
|
143 |
+
|
144 |
+
## Original Wan Releases
|
145 |
+
|
146 |
+
### February 25, 2025
|
147 |
+
👋 We've released the inference code and weights of Wan2.1.
|
148 |
+
|
149 |
+
### February 27, 2025
|
150 |
+
👋 Wan2.1 has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
|
docs/CLI.md
ADDED
@@ -0,0 +1,226 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
--vace-1-3B--vace-1-3B# Command Line Reference
|
2 |
+
|
3 |
+
This document covers all available command line options for WanGP.
|
4 |
+
|
5 |
+
## Basic Usage
|
6 |
+
|
7 |
+
```bash
|
8 |
+
# Default launch
|
9 |
+
python wgp.py
|
10 |
+
|
11 |
+
# Specific model modes
|
12 |
+
python wgp.py --i2v # Image-to-video
|
13 |
+
python wgp.py --t2v # Text-to-video (default)
|
14 |
+
python wgp.py --t2v-14B # 14B text-to-video model
|
15 |
+
python wgp.py --t2v-1-3B # 1.3B text-to-video model
|
16 |
+
python wgp.py --i2v-14B # 14B image-to-video model
|
17 |
+
python wgp.py --i2v-1-3B # Fun InP 1.3B image-to-video model
|
18 |
+
python wgp.py --vace-1-3B # VACE ControlNet 1.3B model
|
19 |
+
```
|
20 |
+
|
21 |
+
## Model and Performance Options
|
22 |
+
|
23 |
+
### Model Configuration
|
24 |
+
```bash
|
25 |
+
--quantize-transformer BOOL # Enable/disable transformer quantization (default: True)
|
26 |
+
--compile # Enable PyTorch compilation (requires Triton)
|
27 |
+
--attention MODE # Force attention mode: sdpa, flash, sage, sage2
|
28 |
+
--profile NUMBER # Performance profile 1-5 (default: 4)
|
29 |
+
--preload NUMBER # Preload N MB of diffusion model in VRAM
|
30 |
+
--fp16 # Force fp16 instead of bf16 models
|
31 |
+
--gpu DEVICE # Run on specific GPU device (e.g., "cuda:1")
|
32 |
+
```
|
33 |
+
|
34 |
+
### Performance Profiles
|
35 |
+
- **Profile 1**: Load entire current model in VRAM and keep all unused models in reserved RAM for fast VRAM tranfers
|
36 |
+
- **Profile 2**: Load model parts as needed, keep all unused models in reserved RAM for fast VRAM tranfers
|
37 |
+
- **Profile 3**: Load entire current model in VRAM (requires 24GB for 14B model)
|
38 |
+
- **Profile 4**: Default and recommended, load model parts as needed, most flexible option
|
39 |
+
- **Profile 5**: Minimum RAM usage
|
40 |
+
|
41 |
+
### Memory Management
|
42 |
+
```bash
|
43 |
+
--perc-reserved-mem-max FLOAT # Max percentage of RAM for reserved memory (< 0.5)
|
44 |
+
```
|
45 |
+
|
46 |
+
## Lora Configuration
|
47 |
+
|
48 |
+
```bash
|
49 |
+
--lora-dir PATH # Path to Wan t2v loras directory
|
50 |
+
--lora-dir-i2v PATH # Path to Wan i2v loras directory
|
51 |
+
--lora-dir-hunyuan PATH # Path to Hunyuan t2v loras directory
|
52 |
+
--lora-dir-hunyuan-i2v PATH # Path to Hunyuan i2v loras directory
|
53 |
+
--lora-dir-ltxv PATH # Path to LTX Video loras directory
|
54 |
+
--lora-preset PRESET # Load lora preset file (.lset) on startup
|
55 |
+
--check-loras # Filter incompatible loras (slower startup)
|
56 |
+
```
|
57 |
+
|
58 |
+
## Generation Settings
|
59 |
+
|
60 |
+
### Basic Generation
|
61 |
+
```bash
|
62 |
+
--seed NUMBER # Set default seed value
|
63 |
+
--frames NUMBER # Set default number of frames to generate
|
64 |
+
--steps NUMBER # Set default number of denoising steps
|
65 |
+
--advanced # Launch with advanced mode enabled
|
66 |
+
```
|
67 |
+
|
68 |
+
### Advanced Generation
|
69 |
+
```bash
|
70 |
+
--teacache MULTIPLIER # TeaCache speed multiplier: 0, 1.5, 1.75, 2.0, 2.25, 2.5
|
71 |
+
```
|
72 |
+
|
73 |
+
## Interface and Server Options
|
74 |
+
|
75 |
+
### Server Configuration
|
76 |
+
```bash
|
77 |
+
--server-port PORT # Gradio server port (default: 7860)
|
78 |
+
--server-name NAME # Gradio server name (default: localhost)
|
79 |
+
--listen # Make server accessible on network
|
80 |
+
--share # Create shareable HuggingFace URL for remote access
|
81 |
+
--open-browser # Open browser automatically when launching
|
82 |
+
```
|
83 |
+
|
84 |
+
### Interface Options
|
85 |
+
```bash
|
86 |
+
--lock-config # Prevent modifying video engine configuration from interface
|
87 |
+
--theme THEME_NAME # UI theme: "default" or "gradio"
|
88 |
+
```
|
89 |
+
|
90 |
+
## File and Directory Options
|
91 |
+
|
92 |
+
```bash
|
93 |
+
--settings PATH # Path to folder containing default settings for all models
|
94 |
+
--verbose LEVEL # Information level 0-2 (default: 1)
|
95 |
+
```
|
96 |
+
|
97 |
+
## Examples
|
98 |
+
|
99 |
+
### Basic Usage Examples
|
100 |
+
```bash
|
101 |
+
# Launch with specific model and loras
|
102 |
+
python wgp.py --t2v-14B --lora-preset mystyle.lset
|
103 |
+
|
104 |
+
# High-performance setup with compilation
|
105 |
+
python wgp.py --compile --attention sage2 --profile 3
|
106 |
+
|
107 |
+
# Low VRAM setup
|
108 |
+
python wgp.py --t2v-1-3B --profile 4 --attention sdpa
|
109 |
+
|
110 |
+
# Multiple images with custom lora directory
|
111 |
+
python wgp.py --i2v --multiple-images --lora-dir /path/to/shared/loras
|
112 |
+
```
|
113 |
+
|
114 |
+
### Server Configuration Examples
|
115 |
+
```bash
|
116 |
+
# Network accessible server
|
117 |
+
python wgp.py --listen --server-port 8080
|
118 |
+
|
119 |
+
# Shareable server with custom theme
|
120 |
+
python wgp.py --share --theme gradio --open-browser
|
121 |
+
|
122 |
+
# Locked configuration for public use
|
123 |
+
python wgp.py --lock-config --share
|
124 |
+
```
|
125 |
+
|
126 |
+
### Advanced Performance Examples
|
127 |
+
```bash
|
128 |
+
# Maximum performance (requires high-end GPU)
|
129 |
+
python wgp.py --compile --attention sage2 --profile 3 --preload 2000
|
130 |
+
|
131 |
+
# Optimized for RTX 2080Ti
|
132 |
+
python wgp.py --profile 4 --attention sdpa --teacache 2.0
|
133 |
+
|
134 |
+
# Memory-efficient setup
|
135 |
+
python wgp.py --fp16 --profile 4 --perc-reserved-mem-max 0.3
|
136 |
+
```
|
137 |
+
|
138 |
+
### TeaCache Configuration
|
139 |
+
```bash
|
140 |
+
# Different speed multipliers
|
141 |
+
python wgp.py --teacache 1.5 # 1.5x speed, minimal quality loss
|
142 |
+
python wgp.py --teacache 2.0 # 2x speed, some quality loss
|
143 |
+
python wgp.py --teacache 2.5 # 2.5x speed, noticeable quality loss
|
144 |
+
python wgp.py --teacache 0 # Disable TeaCache
|
145 |
+
```
|
146 |
+
|
147 |
+
## Attention Modes
|
148 |
+
|
149 |
+
### SDPA (Default)
|
150 |
+
```bash
|
151 |
+
python wgp.py --attention sdpa
|
152 |
+
```
|
153 |
+
- Available by default with PyTorch
|
154 |
+
- Good compatibility with all GPUs
|
155 |
+
- Moderate performance
|
156 |
+
|
157 |
+
### Sage Attention
|
158 |
+
```bash
|
159 |
+
python wgp.py --attention sage
|
160 |
+
```
|
161 |
+
- Requires Triton installation
|
162 |
+
- 30% faster than SDPA
|
163 |
+
- Small quality cost
|
164 |
+
|
165 |
+
### Sage2 Attention
|
166 |
+
```bash
|
167 |
+
python wgp.py --attention sage2
|
168 |
+
```
|
169 |
+
- Requires Triton and SageAttention 2.x
|
170 |
+
- 40% faster than SDPA
|
171 |
+
- Best performance option
|
172 |
+
|
173 |
+
### Flash Attention
|
174 |
+
```bash
|
175 |
+
python wgp.py --attention flash
|
176 |
+
```
|
177 |
+
- May require CUDA kernel compilation
|
178 |
+
- Good performance
|
179 |
+
- Can be complex to install on Windows
|
180 |
+
|
181 |
+
## Troubleshooting Command Lines
|
182 |
+
|
183 |
+
### Fallback to Basic Setup
|
184 |
+
```bash
|
185 |
+
# If advanced features don't work
|
186 |
+
python wgp.py --attention sdpa --profile 4 --fp16
|
187 |
+
```
|
188 |
+
|
189 |
+
### Debug Mode
|
190 |
+
```bash
|
191 |
+
# Maximum verbosity for troubleshooting
|
192 |
+
python wgp.py --verbose 2 --check-loras
|
193 |
+
```
|
194 |
+
|
195 |
+
### Memory Issue Debugging
|
196 |
+
```bash
|
197 |
+
# Minimal memory usage
|
198 |
+
python wgp.py --profile 4 --attention sdpa --perc-reserved-mem-max 0.2
|
199 |
+
```
|
200 |
+
|
201 |
+
|
202 |
+
|
203 |
+
## Configuration Files
|
204 |
+
|
205 |
+
### Settings Files
|
206 |
+
Load custom settings:
|
207 |
+
```bash
|
208 |
+
python wgp.py --settings /path/to/settings/folder
|
209 |
+
```
|
210 |
+
|
211 |
+
### Lora Presets
|
212 |
+
Create and share lora configurations:
|
213 |
+
```bash
|
214 |
+
# Load specific preset
|
215 |
+
python wgp.py --lora-preset anime_style.lset
|
216 |
+
|
217 |
+
# With custom lora directory
|
218 |
+
python wgp.py --lora-preset mystyle.lset --lora-dir /shared/loras
|
219 |
+
```
|
220 |
+
|
221 |
+
## Environment Variables
|
222 |
+
|
223 |
+
While not command line options, these environment variables can affect behavior:
|
224 |
+
- `CUDA_VISIBLE_DEVICES` - Limit visible GPUs
|
225 |
+
- `PYTORCH_CUDA_ALLOC_CONF` - CUDA memory allocation settings
|
226 |
+
- `TRITON_CACHE_DIR` - Triton cache directory (for Sage attention)
|
docs/GETTING_STARTED.md
ADDED
@@ -0,0 +1,194 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Getting Started with WanGP
|
2 |
+
|
3 |
+
This guide will help you get started with WanGP video generation quickly and easily.
|
4 |
+
|
5 |
+
## Prerequisites
|
6 |
+
|
7 |
+
Before starting, ensure you have:
|
8 |
+
- A compatible GPU (RTX 10XX or newer recommended)
|
9 |
+
- Python 3.10.9 installed
|
10 |
+
- At least 6GB of VRAM for basic models
|
11 |
+
- Internet connection for model downloads
|
12 |
+
|
13 |
+
## Quick Setup
|
14 |
+
|
15 |
+
### Option 1: One-Click Installation (Recommended)
|
16 |
+
Use [Pinokio App](https://pinokio.computer/) for the easiest installation experience.
|
17 |
+
|
18 |
+
### Option 2: Manual Installation
|
19 |
+
```bash
|
20 |
+
git clone https://github.com/deepbeepmeep/Wan2GP.git
|
21 |
+
cd Wan2GP
|
22 |
+
conda create -n wan2gp python=3.10.9
|
23 |
+
conda activate wan2gp
|
24 |
+
pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
|
25 |
+
pip install -r requirements.txt
|
26 |
+
```
|
27 |
+
|
28 |
+
For detailed installation instructions, see [INSTALLATION.md](INSTALLATION.md).
|
29 |
+
|
30 |
+
## First Launch
|
31 |
+
|
32 |
+
### Basic Launch
|
33 |
+
```bash
|
34 |
+
python wgp.py
|
35 |
+
```
|
36 |
+
This launches the WanGP generator with default settings. You will be able to pick from a Drop Down menu which model you want to use.
|
37 |
+
|
38 |
+
### Alternative Modes
|
39 |
+
```bash
|
40 |
+
python wgp.py --i2v # Wan Image-to-video mode
|
41 |
+
python wgp.py --t2v-1-3B # Wan Smaller, faster model
|
42 |
+
```
|
43 |
+
|
44 |
+
## Understanding the Interface
|
45 |
+
|
46 |
+
When you launch WanGP, you'll see a web interface with several sections:
|
47 |
+
|
48 |
+
### Main Generation Panel
|
49 |
+
- **Model Selection**: Dropdown to choose between different models
|
50 |
+
- **Prompt**: Text description of what you want to generate
|
51 |
+
- **Generate Button**: Start the video generation process
|
52 |
+
|
53 |
+
### Advanced Settings (click checkbox to enable)
|
54 |
+
- **Generation Settings**: Steps, guidance, seeds
|
55 |
+
- **Loras**: Additional style customizations
|
56 |
+
- **Sliding Window**: For longer videos
|
57 |
+
|
58 |
+
## Your First Video
|
59 |
+
|
60 |
+
Let's generate a simple text-to-video:
|
61 |
+
|
62 |
+
1. **Launch WanGP**: `python wgp.py`
|
63 |
+
2. **Open Browser**: Navigate to `http://localhost:7860`
|
64 |
+
3. **Enter Prompt**: "A cat walking in a garden"
|
65 |
+
4. **Click Generate**: Wait for the video to be created
|
66 |
+
5. **View Result**: The video will appear in the output section
|
67 |
+
|
68 |
+
### Recommended First Settings
|
69 |
+
- **Model**: Wan 2.1 text2video 1.3B (faster, lower VRAM)
|
70 |
+
- **Frames**: 49 (about 2 seconds)
|
71 |
+
- **Steps**: 20 (good balance of speed/quality)
|
72 |
+
|
73 |
+
## Model Selection
|
74 |
+
|
75 |
+
### Text-to-Video Models
|
76 |
+
- **Wan 2.1 T2V 1.3B**: Fastest, lowest VRAM (6GB), good quality
|
77 |
+
- **Wan 2.1 T2V 14B**: Best quality, requires more VRAM (12GB+)
|
78 |
+
- **Hunyuan Video**: Excellent quality, slower generation
|
79 |
+
- **LTX Video**: Good for longer videos
|
80 |
+
|
81 |
+
### Image-to-Video Models
|
82 |
+
- **Wan Fun InP 1.3B**: Fast image animation
|
83 |
+
- **Wan Fun InP 14B**: Higher quality image animation
|
84 |
+
- **VACE**: Advanced control over video generation
|
85 |
+
|
86 |
+
### Choosing the Right Model
|
87 |
+
- **Low VRAM (6-8GB)**: Use 1.3B models
|
88 |
+
- **Medium VRAM (10-12GB)**: Use 14B models or Hunyuan
|
89 |
+
- **High VRAM (16GB+)**: Any model, longer videos
|
90 |
+
|
91 |
+
## Basic Settings Explained
|
92 |
+
|
93 |
+
### Generation Settings
|
94 |
+
- **Frames**: Number of frames (more = longer video)
|
95 |
+
- 25 frames ≈ 1 second
|
96 |
+
- 49 frames ≈ 2 seconds
|
97 |
+
- 73 frames ≈ 3 seconds
|
98 |
+
|
99 |
+
- **Steps**: Quality vs Speed tradeoff
|
100 |
+
- 15 steps: Fast, lower quality
|
101 |
+
- 20 steps: Good balance
|
102 |
+
- 30+ steps: High quality, slower
|
103 |
+
|
104 |
+
- **Guidance Scale**: How closely to follow the prompt
|
105 |
+
- 3-5: More creative interpretation
|
106 |
+
- 7-10: Closer to prompt description
|
107 |
+
- 12+: Very literal interpretation
|
108 |
+
|
109 |
+
### Seeds
|
110 |
+
- **Random Seed**: Different result each time
|
111 |
+
- **Fixed Seed**: Reproducible results
|
112 |
+
- **Use same seed + prompt**: Generate variations
|
113 |
+
|
114 |
+
## Common Beginner Issues
|
115 |
+
|
116 |
+
### "Out of Memory" Errors
|
117 |
+
1. Use smaller models (1.3B instead of 14B)
|
118 |
+
2. Reduce frame count
|
119 |
+
3. Lower resolution in advanced settings
|
120 |
+
4. Enable quantization (usually on by default)
|
121 |
+
|
122 |
+
### Slow Generation
|
123 |
+
1. Use 1.3B models for speed
|
124 |
+
2. Reduce number of steps
|
125 |
+
3. Install Sage attention (see [INSTALLATION.md](INSTALLATION.md))
|
126 |
+
4. Enable TeaCache: `python wgp.py --teacache 2.0`
|
127 |
+
|
128 |
+
### Poor Quality Results
|
129 |
+
1. Increase number of steps (25-30)
|
130 |
+
2. Improve prompt description
|
131 |
+
3. Use 14B models if you have enough VRAM
|
132 |
+
4. Enable Skip Layer Guidance in advanced settings
|
133 |
+
|
134 |
+
## Writing Good Prompts
|
135 |
+
|
136 |
+
### Basic Structure
|
137 |
+
```
|
138 |
+
[Subject] [Action] [Setting] [Style/Quality modifiers]
|
139 |
+
```
|
140 |
+
|
141 |
+
### Examples
|
142 |
+
```
|
143 |
+
A red sports car driving through a mountain road at sunset, cinematic, high quality
|
144 |
+
|
145 |
+
A woman with long hair walking on a beach, waves in the background, realistic, detailed
|
146 |
+
|
147 |
+
A cat sitting on a windowsill watching rain, cozy atmosphere, soft lighting
|
148 |
+
```
|
149 |
+
|
150 |
+
### Tips
|
151 |
+
- Be specific about what you want
|
152 |
+
- Include style descriptions (cinematic, realistic, etc.)
|
153 |
+
- Mention lighting and atmosphere
|
154 |
+
- Describe the setting in detail
|
155 |
+
- Use quality modifiers (high quality, detailed, etc.)
|
156 |
+
|
157 |
+
## Next Steps
|
158 |
+
|
159 |
+
Once you're comfortable with basic generation:
|
160 |
+
|
161 |
+
1. **Explore Advanced Features**:
|
162 |
+
- [Loras Guide](LORAS.md) - Customize styles and characters
|
163 |
+
- [VACE ControlNet](VACE.md) - Advanced video control
|
164 |
+
- [Command Line Options](CLI.md) - Optimize performance
|
165 |
+
|
166 |
+
2. **Improve Performance**:
|
167 |
+
- Install better attention mechanisms
|
168 |
+
- Optimize memory settings
|
169 |
+
- Use compilation for speed
|
170 |
+
|
171 |
+
3. **Join the Community**:
|
172 |
+
- [Discord Server](https://discord.gg/g7efUW9jGV) - Get help and share videos
|
173 |
+
- Share your best results
|
174 |
+
- Learn from other users
|
175 |
+
|
176 |
+
## Troubleshooting First Steps
|
177 |
+
|
178 |
+
### Installation Issues
|
179 |
+
- Ensure Python 3.10.9 is used
|
180 |
+
- Check CUDA version compatibility
|
181 |
+
- See [INSTALLATION.md](INSTALLATION.md) for detailed steps
|
182 |
+
|
183 |
+
### Generation Issues
|
184 |
+
- Check GPU compatibility
|
185 |
+
- Verify sufficient VRAM
|
186 |
+
- Try basic settings first
|
187 |
+
- See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for specific issues
|
188 |
+
|
189 |
+
### Performance Issues
|
190 |
+
- Use appropriate model for your hardware
|
191 |
+
- Enable performance optimizations
|
192 |
+
- Check [CLI.md](CLI.md) for optimization flags
|
193 |
+
|
194 |
+
Remember: Start simple and gradually explore more advanced features as you become comfortable with the basics!
|
docs/INSTALLATION.md
ADDED
@@ -0,0 +1,170 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Installation Guide
|
2 |
+
|
3 |
+
This guide covers installation for different GPU generations and operating systems.
|
4 |
+
|
5 |
+
## Requirements
|
6 |
+
|
7 |
+
- Python 3.10.9
|
8 |
+
- Conda or Python venv
|
9 |
+
- Compatible GPU (RTX 10XX or newer recommended)
|
10 |
+
|
11 |
+
## Installation for RTX 10XX to RTX 40XX (Stable)
|
12 |
+
|
13 |
+
This installation uses PyTorch 2.6.0 which is well-tested and stable.
|
14 |
+
|
15 |
+
### Step 1: Download and Setup Environment
|
16 |
+
|
17 |
+
```shell
|
18 |
+
# Clone the repository
|
19 |
+
git clone https://github.com/deepbeepmeep/Wan2GP.git
|
20 |
+
cd Wan2GP
|
21 |
+
|
22 |
+
# Create Python 3.10.9 environment using conda
|
23 |
+
conda create -n wan2gp python=3.10.9
|
24 |
+
conda activate wan2gp
|
25 |
+
```
|
26 |
+
|
27 |
+
### Step 2: Install PyTorch
|
28 |
+
|
29 |
+
```shell
|
30 |
+
# Install PyTorch 2.6.0 with CUDA 12.4
|
31 |
+
pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
|
32 |
+
```
|
33 |
+
|
34 |
+
### Step 3: Install Dependencies
|
35 |
+
|
36 |
+
```shell
|
37 |
+
# Install core dependencies
|
38 |
+
pip install -r requirements.txt
|
39 |
+
```
|
40 |
+
|
41 |
+
### Step 4: Optional Performance Optimizations
|
42 |
+
|
43 |
+
#### Sage Attention (30% faster)
|
44 |
+
|
45 |
+
```shell
|
46 |
+
# Windows only: Install Triton
|
47 |
+
pip install triton-windows
|
48 |
+
|
49 |
+
# For both Windows and Linux
|
50 |
+
pip install sageattention==1.0.6
|
51 |
+
```
|
52 |
+
|
53 |
+
#### Sage 2 Attention (40% faster)
|
54 |
+
|
55 |
+
```shell
|
56 |
+
# Windows
|
57 |
+
pip install triton-windows
|
58 |
+
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl
|
59 |
+
|
60 |
+
# Linux (manual compilation required)
|
61 |
+
git clone https://github.com/thu-ml/SageAttention
|
62 |
+
cd SageAttention
|
63 |
+
pip install -e .
|
64 |
+
```
|
65 |
+
|
66 |
+
#### Flash Attention
|
67 |
+
|
68 |
+
```shell
|
69 |
+
# May require CUDA kernel compilation on Windows
|
70 |
+
pip install flash-attn==2.7.2.post1
|
71 |
+
```
|
72 |
+
|
73 |
+
## Installation for RTX 50XX (Beta)
|
74 |
+
|
75 |
+
RTX 50XX GPUs require PyTorch 2.7.0 (beta). This version may be less stable.
|
76 |
+
|
77 |
+
⚠️ **Important:** Use Python 3.10 for compatibility with pip wheels.
|
78 |
+
|
79 |
+
### Step 1: Setup Environment
|
80 |
+
|
81 |
+
```shell
|
82 |
+
# Clone and setup (same as above)
|
83 |
+
git clone https://github.com/deepbeepmeep/Wan2GP.git
|
84 |
+
cd Wan2GP
|
85 |
+
conda create -n wan2gp python=3.10.9
|
86 |
+
conda activate wan2gp
|
87 |
+
```
|
88 |
+
|
89 |
+
### Step 2: Install PyTorch Beta
|
90 |
+
|
91 |
+
```shell
|
92 |
+
# Install PyTorch 2.7.0 with CUDA 12.8
|
93 |
+
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
|
94 |
+
```
|
95 |
+
|
96 |
+
### Step 3: Install Dependencies
|
97 |
+
|
98 |
+
```shell
|
99 |
+
pip install -r requirements.txt
|
100 |
+
```
|
101 |
+
|
102 |
+
### Step 4: Optional Optimizations for RTX 50XX
|
103 |
+
|
104 |
+
#### Sage Attention
|
105 |
+
|
106 |
+
```shell
|
107 |
+
# Windows
|
108 |
+
pip install triton-windows
|
109 |
+
pip install sageattention==1.0.6
|
110 |
+
|
111 |
+
# Linux
|
112 |
+
pip install sageattention==1.0.6
|
113 |
+
```
|
114 |
+
|
115 |
+
#### Sage 2 Attention
|
116 |
+
|
117 |
+
```shell
|
118 |
+
# Windows
|
119 |
+
pip install triton-windows
|
120 |
+
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl
|
121 |
+
|
122 |
+
# Linux (manual compilation)
|
123 |
+
git clone https://github.com/thu-ml/SageAttention
|
124 |
+
cd SageAttention
|
125 |
+
pip install -e .
|
126 |
+
```
|
127 |
+
|
128 |
+
## Attention Modes
|
129 |
+
|
130 |
+
WanGP supports several attention implementations:
|
131 |
+
|
132 |
+
- **SDPA** (default): Available by default with PyTorch
|
133 |
+
- **Sage**: 30% speed boost with small quality cost
|
134 |
+
- **Sage2**: 40% speed boost
|
135 |
+
- **Flash**: Good performance, may be complex to install on Windows
|
136 |
+
|
137 |
+
## Performance Profiles
|
138 |
+
|
139 |
+
Choose a profile based on your hardware:
|
140 |
+
|
141 |
+
- **Profile 3 (LowRAM_HighVRAM)**: Loads entire model in VRAM, requires 24GB VRAM for 8-bit quantized 14B model
|
142 |
+
- **Profile 4 (LowRAM_LowVRAM)**: Default, loads model parts as needed, slower but lower VRAM requirement
|
143 |
+
|
144 |
+
## Troubleshooting
|
145 |
+
|
146 |
+
### Sage Attention Issues
|
147 |
+
|
148 |
+
If Sage attention doesn't work:
|
149 |
+
|
150 |
+
1. Check if Triton is properly installed
|
151 |
+
2. Clear Triton cache
|
152 |
+
3. Fallback to SDPA attention:
|
153 |
+
```bash
|
154 |
+
python wgp.py --attention sdpa
|
155 |
+
```
|
156 |
+
|
157 |
+
### Memory Issues
|
158 |
+
|
159 |
+
- Use lower resolution or shorter videos
|
160 |
+
- Enable quantization (default)
|
161 |
+
- Use Profile 4 for lower VRAM usage
|
162 |
+
- Consider using 1.3B models instead of 14B models
|
163 |
+
|
164 |
+
### GPU Compatibility
|
165 |
+
|
166 |
+
- RTX 10XX, 20XX: Supported with SDPA attention
|
167 |
+
- RTX 30XX, 40XX: Full feature support
|
168 |
+
- RTX 50XX: Beta support with PyTorch 2.7.0
|
169 |
+
|
170 |
+
For more troubleshooting, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md)
|
docs/LORAS.md
ADDED
@@ -0,0 +1,197 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Loras Guide
|
2 |
+
|
3 |
+
Loras (Low-Rank Adaptations) allow you to customize video generation models by adding specific styles, characters, or effects to your videos.
|
4 |
+
|
5 |
+
## Directory Structure
|
6 |
+
|
7 |
+
Loras are organized in different folders based on the model they're designed for:
|
8 |
+
|
9 |
+
### Text-to-Video Models
|
10 |
+
- `loras/` - General t2v loras
|
11 |
+
- `loras/1.3B/` - Loras specifically for 1.3B models
|
12 |
+
- `loras/14B/` - Loras specifically for 14B models
|
13 |
+
|
14 |
+
### Image-to-Video Models
|
15 |
+
- `loras_i2v/` - Image-to-video loras
|
16 |
+
|
17 |
+
### Other Models
|
18 |
+
- `loras_hunyuan/` - Hunyuan Video t2v loras
|
19 |
+
- `loras_hunyuan_i2v/` - Hunyuan Video i2v loras
|
20 |
+
- `loras_ltxv/` - LTX Video loras
|
21 |
+
|
22 |
+
## Custom Lora Directory
|
23 |
+
|
24 |
+
You can specify custom lora directories when launching the app:
|
25 |
+
|
26 |
+
```bash
|
27 |
+
# Use shared lora directory for both t2v and i2v
|
28 |
+
python wgp.py --lora-dir /path/to/shared/loras --lora-dir-i2v /path/to/shared/loras
|
29 |
+
|
30 |
+
# Specify different directories for different models
|
31 |
+
python wgp.py --lora-dir-hunyuan /path/to/hunyuan/loras --lora-dir-ltxv /path/to/ltx/loras
|
32 |
+
```
|
33 |
+
|
34 |
+
## Using Loras
|
35 |
+
|
36 |
+
### Basic Usage
|
37 |
+
|
38 |
+
1. Place your lora files in the appropriate directory
|
39 |
+
2. Launch WanGP
|
40 |
+
3. In the Advanced Tab, select the "Loras" section
|
41 |
+
4. Check the loras you want to activate
|
42 |
+
5. Set multipliers for each lora (default is 1.0)
|
43 |
+
|
44 |
+
### Lora Multipliers
|
45 |
+
|
46 |
+
Multipliers control the strength of each lora's effect:
|
47 |
+
|
48 |
+
#### Simple Multipliers
|
49 |
+
```
|
50 |
+
1.2 0.8
|
51 |
+
```
|
52 |
+
- First lora: 1.2 strength
|
53 |
+
- Second lora: 0.8 strength
|
54 |
+
|
55 |
+
#### Time-based Multipliers
|
56 |
+
For dynamic effects over generation steps, use comma-separated values:
|
57 |
+
```
|
58 |
+
0.9,0.8,0.7
|
59 |
+
1.2,1.1,1.0
|
60 |
+
```
|
61 |
+
- For 30 steps: steps 0-9 use first value, 10-19 use second, 20-29 use third
|
62 |
+
- First lora: 0.9 → 0.8 → 0.7
|
63 |
+
- Second lora: 1.2 → 1.1 → 1.0
|
64 |
+
|
65 |
+
## Lora Presets
|
66 |
+
|
67 |
+
Presets are combinations of loras with predefined multipliers and prompts.
|
68 |
+
|
69 |
+
### Creating Presets
|
70 |
+
1. Configure your loras and multipliers
|
71 |
+
2. Write a prompt with comments (lines starting with #)
|
72 |
+
3. Save as a preset with `.lset` extension
|
73 |
+
|
74 |
+
### Example Preset
|
75 |
+
```
|
76 |
+
# Use the keyword "ohnvx" to trigger the lora
|
77 |
+
A ohnvx character is driving a car through the city
|
78 |
+
```
|
79 |
+
|
80 |
+
### Using Presets
|
81 |
+
```bash
|
82 |
+
# Load preset on startup
|
83 |
+
python wgp.py --lora-preset mypreset.lset
|
84 |
+
```
|
85 |
+
|
86 |
+
### Managing Presets
|
87 |
+
- Edit, save, or delete presets directly from the web interface
|
88 |
+
- Presets include comments with usage instructions
|
89 |
+
- Share `.lset` files with other users
|
90 |
+
|
91 |
+
## CausVid Lora (Special)
|
92 |
+
|
93 |
+
CausVid is a distilled Wan model that generates videos in 4-12 steps with 2x speed improvement.
|
94 |
+
|
95 |
+
### Setup Instructions
|
96 |
+
1. Download the CausVid Lora:
|
97 |
+
```
|
98 |
+
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
|
99 |
+
```
|
100 |
+
2. Place in your `loras/` directory
|
101 |
+
|
102 |
+
### Usage
|
103 |
+
1. Select a Wan t2v model (e.g., Wan 2.1 text2video 13B or Vace 13B)
|
104 |
+
2. Enable Advanced Mode
|
105 |
+
3. In Advanced Generation Tab:
|
106 |
+
- Set Guidance Scale = 1
|
107 |
+
- Set Shift Scale = 7
|
108 |
+
4. In Advanced Lora Tab:
|
109 |
+
- Select CausVid Lora
|
110 |
+
- Set multiplier to 0.3
|
111 |
+
5. Set generation steps to 12
|
112 |
+
6. Generate!
|
113 |
+
|
114 |
+
### CausVid Step/Multiplier Relationship
|
115 |
+
- **12 steps**: 0.3 multiplier (recommended)
|
116 |
+
- **8 steps**: 0.5-0.7 multiplier
|
117 |
+
- **4 steps**: 0.8-1.0 multiplier
|
118 |
+
|
119 |
+
*Note: Lower steps = lower quality (especially motion)*
|
120 |
+
|
121 |
+
## Supported Formats
|
122 |
+
|
123 |
+
WanGP supports multiple lora formats:
|
124 |
+
- **Safetensors** (.safetensors)
|
125 |
+
- **Replicate** format
|
126 |
+
- **Standard PyTorch** (.pt, .pth)
|
127 |
+
|
128 |
+
## Performance Tips
|
129 |
+
|
130 |
+
### Fast Loading/Unloading
|
131 |
+
- Loras can be added/removed without restarting the app
|
132 |
+
- Use the "Refresh" button to detect new loras
|
133 |
+
- Enable `--check-loras` to filter incompatible loras (slower startup)
|
134 |
+
|
135 |
+
### Memory Management
|
136 |
+
- Loras are loaded on-demand to save VRAM
|
137 |
+
- Multiple loras can be used simultaneously
|
138 |
+
- Time-based multipliers don't use extra memory
|
139 |
+
|
140 |
+
## Finding Loras
|
141 |
+
|
142 |
+
### Sources
|
143 |
+
- **[Civitai](https://civitai.com/)** - Large community collection
|
144 |
+
- **HuggingFace** - Official and community loras
|
145 |
+
- **Discord Server** - Community recommendations
|
146 |
+
|
147 |
+
### Creating Loras
|
148 |
+
- **Kohya** - Popular training tool
|
149 |
+
- **OneTrainer** - Alternative training solution
|
150 |
+
- **Custom datasets** - Train on your own content
|
151 |
+
|
152 |
+
## Macro System (Advanced)
|
153 |
+
|
154 |
+
Create multiple prompts from templates using macros:
|
155 |
+
|
156 |
+
```
|
157 |
+
! {Subject}="cat","woman","man", {Location}="forest","lake","city", {Possessive}="its","her","his"
|
158 |
+
In the video, a {Subject} is presented. The {Subject} is in a {Location} and looks at {Possessive} watch.
|
159 |
+
```
|
160 |
+
|
161 |
+
This generates:
|
162 |
+
1. "In the video, a cat is presented. The cat is in a forest and looks at its watch."
|
163 |
+
2. "In the video, a woman is presented. The woman is in a lake and looks at her watch."
|
164 |
+
3. "In the video, a man is presented. The man is in a city and looks at his watch."
|
165 |
+
|
166 |
+
## Troubleshooting
|
167 |
+
|
168 |
+
### Lora Not Working
|
169 |
+
1. Check if lora is compatible with your model size (1.3B vs 14B)
|
170 |
+
2. Verify lora format is supported
|
171 |
+
3. Try different multiplier values
|
172 |
+
4. Check the lora was trained for your model type (t2v vs i2v)
|
173 |
+
|
174 |
+
### Performance Issues
|
175 |
+
1. Reduce number of active loras
|
176 |
+
2. Lower multiplier values
|
177 |
+
3. Use `--check-loras` to filter incompatible files
|
178 |
+
4. Clear lora cache if issues persist
|
179 |
+
|
180 |
+
### Memory Errors
|
181 |
+
1. Use fewer loras simultaneously
|
182 |
+
2. Reduce model size (use 1.3B instead of 14B)
|
183 |
+
3. Lower video resolution or frame count
|
184 |
+
4. Enable quantization if not already active
|
185 |
+
|
186 |
+
## Command Line Options
|
187 |
+
|
188 |
+
```bash
|
189 |
+
# Lora-related command line options
|
190 |
+
--lora-dir path # Path to t2v loras directory
|
191 |
+
--lora-dir-i2v path # Path to i2v loras directory
|
192 |
+
--lora-dir-hunyuan path # Path to Hunyuan t2v loras
|
193 |
+
--lora-dir-hunyuan-i2v path # Path to Hunyuan i2v loras
|
194 |
+
--lora-dir-ltxv path # Path to LTX Video loras
|
195 |
+
--lora-preset preset # Load preset on startup
|
196 |
+
--check-loras # Filter incompatible loras
|
197 |
+
```
|
docs/MODELS.md
ADDED
@@ -0,0 +1,268 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Models Overview
|
2 |
+
|
3 |
+
WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations.
|
4 |
+
|
5 |
+
|
6 |
+
## Wan 2.1 Text2Video Models
|
7 |
+
Please note that that the term *Text2Video* refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now generate videos using images.
|
8 |
+
|
9 |
+
#### Wan 2.1 Text2Video 1.3B
|
10 |
+
- **Size**: 1.3 billion parameters
|
11 |
+
- **VRAM**: 6GB minimum
|
12 |
+
- **Speed**: Fast generation
|
13 |
+
- **Quality**: Good quality for the size
|
14 |
+
- **Best for**: Quick iterations, lower-end hardware
|
15 |
+
- **Command**: `python wgp.py --t2v-1-3B`
|
16 |
+
|
17 |
+
#### Wan 2.1 Text2Video 14B
|
18 |
+
- **Size**: 14 billion parameters
|
19 |
+
- **VRAM**: 12GB+ recommended
|
20 |
+
- **Speed**: Slower but higher quality
|
21 |
+
- **Quality**: Excellent detail and coherence
|
22 |
+
- **Best for**: Final production videos
|
23 |
+
- **Command**: `python wgp.py --t2v-14B`
|
24 |
+
|
25 |
+
#### Wan Vace 1.3B
|
26 |
+
- **Type**: ControlNet for advanced video control
|
27 |
+
- **VRAM**: 6GB minimum
|
28 |
+
- **Features**: Motion transfer, object injection, inpainting
|
29 |
+
- **Best for**: Advanced video manipulation
|
30 |
+
- **Command**: `python wgp.py --vace-1.3B`
|
31 |
+
|
32 |
+
#### Wan Vace 14B
|
33 |
+
- **Type**: Large ControlNet model
|
34 |
+
- **VRAM**: 12GB+ recommended
|
35 |
+
- **Features**: All Vace features with higher quality
|
36 |
+
- **Best for**: Professional video editing workflows
|
37 |
+
|
38 |
+
#### MoviiGen (Experimental)
|
39 |
+
- **Resolution**: Claims 1080p capability
|
40 |
+
- **VRAM**: 20GB+ required
|
41 |
+
- **Speed**: Very slow generation
|
42 |
+
- **Features**: Should generate cinema like video, specialized for 2.1 / 1 ratios
|
43 |
+
- **Status**: Experimental, feedback welcome
|
44 |
+
|
45 |
+
<BR>
|
46 |
+
|
47 |
+
## Wan 2.1 Image-to-Video Models
|
48 |
+
|
49 |
+
#### Wan 2.1 Image2Video 14B
|
50 |
+
- **Size**: 14 billion parameters
|
51 |
+
- **VRAM**: 12GB+ recommended
|
52 |
+
- **Speed**: Slower but higher quality
|
53 |
+
- **Quality**: Excellent detail and coherence
|
54 |
+
- **Best for**: Most Loras available work with this model
|
55 |
+
- **Command**: `python wgp.py --i2v-14B`
|
56 |
+
|
57 |
+
#### FLF2V
|
58 |
+
- **Type**: Start/end frame specialist
|
59 |
+
- **Resolution**: Optimized for 720p
|
60 |
+
- **Official**: Wan team supported
|
61 |
+
- **Use case**: Image-to-video with specific endpoints
|
62 |
+
|
63 |
+
|
64 |
+
<BR>
|
65 |
+
|
66 |
+
## Wan 2.1 Specialized Models
|
67 |
+
|
68 |
+
#### FantasySpeaking
|
69 |
+
- **Type**: Talking head animation
|
70 |
+
- **Input**: Voice track + image
|
71 |
+
- **Works on**: People and objects
|
72 |
+
- **Use case**: Lip-sync and voice-driven animation
|
73 |
+
|
74 |
+
#### Phantom
|
75 |
+
- **Type**: Person/object transfer
|
76 |
+
- **Resolution**: Works well at 720p
|
77 |
+
- **Requirements**: 30+ steps for good results
|
78 |
+
- **Best for**: Transferring subjects between videos
|
79 |
+
|
80 |
+
#### Recam Master
|
81 |
+
- **Type**: Viewpoint change
|
82 |
+
- **Requirements**: 81+ frame input videos, 15+ denoising steps
|
83 |
+
- **Use case**: View same scene from different angles
|
84 |
+
|
85 |
+
#### Sky Reels v2
|
86 |
+
- **Type**: Diffusion Forcing model
|
87 |
+
- **Specialty**: "Infinite length" videos
|
88 |
+
- **Features**: High quality continuous generation
|
89 |
+
|
90 |
+
|
91 |
+
<BR>
|
92 |
+
|
93 |
+
## Wan Fun InP Models
|
94 |
+
|
95 |
+
#### Wan Fun InP 1.3B
|
96 |
+
- **Size**: 1.3 billion parameters
|
97 |
+
- **VRAM**: 6GB minimum
|
98 |
+
- **Quality**: Good for the size, accessible to lower hardware
|
99 |
+
- **Best for**: Entry-level image animation
|
100 |
+
- **Command**: `python wgp.py --i2v-1-3B`
|
101 |
+
|
102 |
+
#### Wan Fun InP 14B
|
103 |
+
- **Size**: 14 billion parameters
|
104 |
+
- **VRAM**: 12GB+ recommended
|
105 |
+
- **Quality**: Better end image support
|
106 |
+
- **Limitation**: Existing loras don't work as well
|
107 |
+
|
108 |
+
<BR>
|
109 |
+
|
110 |
+
## Wan Special Loras
|
111 |
+
### Causvid
|
112 |
+
- **Type**: Distilled model (Lora implementation)
|
113 |
+
- **Speed**: 4-12 steps generation, 2x faster
|
114 |
+
- **Compatible**: Works with Wan 14B models
|
115 |
+
- **Setup**: Requires CausVid Lora (see [LORAS.md](LORAS.md))
|
116 |
+
|
117 |
+
|
118 |
+
<BR>
|
119 |
+
|
120 |
+
## Hunyuan Video Models
|
121 |
+
|
122 |
+
#### Hunyuan Video Text2Video
|
123 |
+
- **Quality**: Among the best open source t2v models
|
124 |
+
- **VRAM**: 12GB+ recommended
|
125 |
+
- **Speed**: Slower generation but excellent results
|
126 |
+
- **Features**: Superior text adherence and video quality, up to 10s of video
|
127 |
+
- **Best for**: High-quality text-to-video generation
|
128 |
+
|
129 |
+
#### Hunyuan Video Custom
|
130 |
+
- **Specialty**: Identity preservation
|
131 |
+
- **Use case**: Injecting specific people into videos
|
132 |
+
- **Quality**: Excellent for character consistency
|
133 |
+
- **Best for**: Character-focused video generation
|
134 |
+
|
135 |
+
#### Hunyuan Video Avater
|
136 |
+
- **Specialty**: Generate up to 15s of high quality speech / song driven Video .
|
137 |
+
- **Use case**: Injecting specific people into videos
|
138 |
+
- **Quality**: Excellent for character consistency
|
139 |
+
- **Best for**: Character-focused video generation, Video synchronized with voice
|
140 |
+
|
141 |
+
|
142 |
+
<BR>
|
143 |
+
|
144 |
+
## LTX Video Models
|
145 |
+
|
146 |
+
#### LTX Video 13B
|
147 |
+
- **Specialty**: Long video generation
|
148 |
+
- **Resolution**: Fast 720p generation
|
149 |
+
- **VRAM**: Optimized by WanGP (4x reduction in requirements)
|
150 |
+
- **Best for**: Longer duration videos
|
151 |
+
|
152 |
+
#### LTX Video 13B Distilled
|
153 |
+
- **Speed**: Generate in less than one minute
|
154 |
+
- **Quality**: Very high quality despite speed
|
155 |
+
- **Best for**: Rapid prototyping and quick results
|
156 |
+
|
157 |
+
<BR>
|
158 |
+
|
159 |
+
## Model Selection Guide
|
160 |
+
|
161 |
+
### By Hardware (VRAM)
|
162 |
+
|
163 |
+
#### 6-8GB VRAM
|
164 |
+
- Wan 2.1 T2V 1.3B
|
165 |
+
- Wan Fun InP 1.3B
|
166 |
+
- Wan Vace 1.3B
|
167 |
+
|
168 |
+
#### 10-12GB VRAM
|
169 |
+
- Wan 2.1 T2V 14B
|
170 |
+
- Wan Fun InP 14B
|
171 |
+
- Hunyuan Video (with optimizations)
|
172 |
+
- LTX Video 13B
|
173 |
+
|
174 |
+
#### 16GB+ VRAM
|
175 |
+
- All models supported
|
176 |
+
- Longer videos possible
|
177 |
+
- Higher resolutions
|
178 |
+
- Multiple simultaneous Loras
|
179 |
+
|
180 |
+
#### 20GB+ VRAM
|
181 |
+
- MoviiGen (experimental 1080p)
|
182 |
+
- Very long videos
|
183 |
+
- Maximum quality settings
|
184 |
+
|
185 |
+
### By Use Case
|
186 |
+
|
187 |
+
#### Quick Prototyping
|
188 |
+
1. **LTX Video 13B Distilled** - Fastest, high quality
|
189 |
+
2. **Wan 2.1 T2V 1.3B** - Fast, good quality
|
190 |
+
3. **CausVid Lora** - 4-12 steps, very fast
|
191 |
+
|
192 |
+
#### Best Quality
|
193 |
+
1. **Hunyuan Video** - Overall best t2v quality
|
194 |
+
2. **Wan 2.1 T2V 14B** - Excellent Wan quality
|
195 |
+
3. **Wan Vace 14B** - Best for controlled generation
|
196 |
+
|
197 |
+
#### Advanced Control
|
198 |
+
1. **Wan Vace 14B/1.3B** - Motion transfer, object injection
|
199 |
+
2. **Phantom** - Person/object transfer
|
200 |
+
3. **FantasySpeaking** - Voice-driven animation
|
201 |
+
|
202 |
+
#### Long Videos
|
203 |
+
1. **LTX Video 13B** - Specialized for length
|
204 |
+
2. **Sky Reels v2** - Infinite length videos
|
205 |
+
3. **Wan Vace + Sliding Windows** - Up to 1 minute
|
206 |
+
|
207 |
+
#### Lower Hardware
|
208 |
+
1. **Wan Fun InP 1.3B** - Image-to-video
|
209 |
+
2. **Wan 2.1 T2V 1.3B** - Text-to-video
|
210 |
+
3. **Wan Vace 1.3B** - Advanced control
|
211 |
+
|
212 |
+
<BR>
|
213 |
+
|
214 |
+
## Performance Comparison
|
215 |
+
|
216 |
+
### Speed (Relative)
|
217 |
+
1. **CausVid Lora** (4-12 steps) - Fastest
|
218 |
+
2. **LTX Video Distilled** - Very fast
|
219 |
+
3. **Wan 1.3B models** - Fast
|
220 |
+
4. **Wan 14B models** - Medium
|
221 |
+
5. **Hunyuan Video** - Slower
|
222 |
+
6. **MoviiGen** - Slowest
|
223 |
+
|
224 |
+
### Quality (Subjective)
|
225 |
+
1. **Hunyuan Video** - Highest overall
|
226 |
+
2. **Wan 14B models** - Excellent
|
227 |
+
3. **LTX Video models** - Very good
|
228 |
+
4. **Wan 1.3B models** - Good
|
229 |
+
5. **CausVid** - Good (varies with steps)
|
230 |
+
|
231 |
+
### VRAM Efficiency
|
232 |
+
1. **Wan 1.3B models** - Most efficient
|
233 |
+
2. **LTX Video** (with WanGP optimizations)
|
234 |
+
3. **Wan 14B models**
|
235 |
+
4. **Hunyuan Video**
|
236 |
+
5. **MoviiGen** - Least efficient
|
237 |
+
|
238 |
+
<BR>
|
239 |
+
|
240 |
+
## Model Switching
|
241 |
+
|
242 |
+
WanGP allows switching between models without restarting:
|
243 |
+
|
244 |
+
1. Use the dropdown menu in the web interface
|
245 |
+
2. Models are loaded on-demand
|
246 |
+
3. Previous model is unloaded to save VRAM
|
247 |
+
4. Settings are preserved when possible
|
248 |
+
|
249 |
+
<BR>
|
250 |
+
|
251 |
+
## Tips for Model Selection
|
252 |
+
|
253 |
+
### First Time Users
|
254 |
+
Start with **Wan 2.1 T2V 1.3B** to learn the interface and test your hardware.
|
255 |
+
|
256 |
+
### Production Work
|
257 |
+
Use **Hunyuan Video** or **Wan 14B** models for final output quality.
|
258 |
+
|
259 |
+
### Experimentation
|
260 |
+
**CausVid Lora** or **LTX Distilled** for rapid iteration and testing.
|
261 |
+
|
262 |
+
### Specialized Tasks
|
263 |
+
- **VACE** for advanced control
|
264 |
+
- **FantasySpeaking** for talking heads
|
265 |
+
- **LTX Video** for long sequences
|
266 |
+
|
267 |
+
### Hardware Optimization
|
268 |
+
Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs.
|
docs/TROUBLESHOOTING.md
ADDED
@@ -0,0 +1,338 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Troubleshooting Guide
|
2 |
+
|
3 |
+
This guide covers common issues and their solutions when using WanGP.
|
4 |
+
|
5 |
+
## Installation Issues
|
6 |
+
|
7 |
+
### PyTorch Installation Problems
|
8 |
+
|
9 |
+
#### CUDA Version Mismatch
|
10 |
+
**Problem**: PyTorch can't detect GPU or CUDA errors
|
11 |
+
**Solution**:
|
12 |
+
```bash
|
13 |
+
# Check your CUDA version
|
14 |
+
nvidia-smi
|
15 |
+
|
16 |
+
# Install matching PyTorch version
|
17 |
+
# For CUDA 12.4 (RTX 10XX-40XX)
|
18 |
+
pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
|
19 |
+
|
20 |
+
# For CUDA 12.8 (RTX 50XX)
|
21 |
+
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
|
22 |
+
```
|
23 |
+
|
24 |
+
#### Python Version Issues
|
25 |
+
**Problem**: Package compatibility errors
|
26 |
+
**Solution**: Ensure you're using Python 3.10.9
|
27 |
+
```bash
|
28 |
+
python --version # Should show 3.10.9
|
29 |
+
conda create -n wan2gp python=3.10.9
|
30 |
+
```
|
31 |
+
|
32 |
+
### Dependency Installation Failures
|
33 |
+
|
34 |
+
#### Triton Installation (Windows)
|
35 |
+
**Problem**: `pip install triton-windows` fails
|
36 |
+
**Solution**:
|
37 |
+
1. Update pip: `pip install --upgrade pip`
|
38 |
+
2. Try pre-compiled wheel
|
39 |
+
3. Fallback to SDPA attention: `python wgp.py --attention sdpa`
|
40 |
+
|
41 |
+
#### SageAttention Compilation Issues
|
42 |
+
**Problem**: SageAttention installation fails
|
43 |
+
**Solution**:
|
44 |
+
1. Install Visual Studio Build Tools (Windows)
|
45 |
+
2. Use pre-compiled wheels when available
|
46 |
+
3. Fallback to basic attention modes
|
47 |
+
|
48 |
+
## Memory Issues
|
49 |
+
|
50 |
+
### CUDA Out of Memory
|
51 |
+
|
52 |
+
#### During Model Loading
|
53 |
+
**Problem**: "CUDA out of memory" when loading model
|
54 |
+
**Solutions**:
|
55 |
+
```bash
|
56 |
+
# Use smaller model
|
57 |
+
python wgp.py --t2v-1-3B
|
58 |
+
|
59 |
+
# Enable quantization (usually default)
|
60 |
+
python wgp.py --quantize-transformer True
|
61 |
+
|
62 |
+
# Use memory-efficient profile
|
63 |
+
python wgp.py --profile 4
|
64 |
+
|
65 |
+
# Reduce preloaded model size
|
66 |
+
python wgp.py --preload 0
|
67 |
+
```
|
68 |
+
|
69 |
+
#### During Video Generation
|
70 |
+
**Problem**: Memory error during generation
|
71 |
+
**Solutions**:
|
72 |
+
1. Reduce frame count (shorter videos)
|
73 |
+
2. Lower resolution in advanced settings
|
74 |
+
3. Use lower batch size
|
75 |
+
4. Clear GPU cache between generations
|
76 |
+
|
77 |
+
### System RAM Issues
|
78 |
+
|
79 |
+
#### High RAM Usage
|
80 |
+
**Problem**: System runs out of RAM
|
81 |
+
**Solutions**:
|
82 |
+
```bash
|
83 |
+
# Limit reserved memory
|
84 |
+
python wgp.py --perc-reserved-mem-max 0.3
|
85 |
+
|
86 |
+
# Use minimal RAM profile
|
87 |
+
python wgp.py --profile 5
|
88 |
+
|
89 |
+
# Enable swap file (OS level)
|
90 |
+
```
|
91 |
+
|
92 |
+
## Performance Issues
|
93 |
+
|
94 |
+
### Slow Generation Speed
|
95 |
+
|
96 |
+
#### General Optimization
|
97 |
+
```bash
|
98 |
+
# Enable compilation (requires Triton)
|
99 |
+
python wgp.py --compile
|
100 |
+
|
101 |
+
# Use faster attention
|
102 |
+
python wgp.py --attention sage2
|
103 |
+
|
104 |
+
# Enable TeaCache
|
105 |
+
python wgp.py --teacache 2.0
|
106 |
+
|
107 |
+
# Use high-performance profile
|
108 |
+
python wgp.py --profile 3
|
109 |
+
```
|
110 |
+
|
111 |
+
#### GPU-Specific Optimizations
|
112 |
+
|
113 |
+
**RTX 10XX/20XX Series**:
|
114 |
+
```bash
|
115 |
+
python wgp.py --attention sdpa --profile 4 --teacache 1.5
|
116 |
+
```
|
117 |
+
|
118 |
+
**RTX 30XX/40XX Series**:
|
119 |
+
```bash
|
120 |
+
python wgp.py --compile --attention sage --profile 3 --teacache 2.0
|
121 |
+
```
|
122 |
+
|
123 |
+
**RTX 50XX Series**:
|
124 |
+
```bash
|
125 |
+
python wgp.py --attention sage --profile 4 --fp16
|
126 |
+
```
|
127 |
+
|
128 |
+
### Attention Mechanism Issues
|
129 |
+
|
130 |
+
#### Sage Attention Not Working
|
131 |
+
**Problem**: Sage attention fails to compile or work
|
132 |
+
**Diagnostic Steps**:
|
133 |
+
1. Check Triton installation:
|
134 |
+
```python
|
135 |
+
import triton
|
136 |
+
print(triton.__version__)
|
137 |
+
```
|
138 |
+
2. Clear Triton cache:
|
139 |
+
```bash
|
140 |
+
# Windows
|
141 |
+
rmdir /s %USERPROFILE%\.triton
|
142 |
+
# Linux
|
143 |
+
rm -rf ~/.triton
|
144 |
+
```
|
145 |
+
3. Fallback solution:
|
146 |
+
```bash
|
147 |
+
python wgp.py --attention sdpa
|
148 |
+
```
|
149 |
+
|
150 |
+
#### Flash Attention Issues
|
151 |
+
**Problem**: Flash attention compilation fails
|
152 |
+
**Solution**:
|
153 |
+
- Windows: Often requires manual CUDA kernel compilation
|
154 |
+
- Linux: Usually works with `pip install flash-attn`
|
155 |
+
- Fallback: Use Sage or SDPA attention
|
156 |
+
|
157 |
+
## Model-Specific Issues
|
158 |
+
|
159 |
+
### Lora Problems
|
160 |
+
|
161 |
+
#### Loras Not Loading
|
162 |
+
**Problem**: Loras don't appear in the interface
|
163 |
+
**Solutions**:
|
164 |
+
1. Check file format (should be .safetensors, .pt, or .pth)
|
165 |
+
2. Verify correct directory:
|
166 |
+
```
|
167 |
+
loras/ # For t2v models
|
168 |
+
loras_i2v/ # For i2v models
|
169 |
+
loras_hunyuan/ # For Hunyuan models
|
170 |
+
```
|
171 |
+
3. Click "Refresh" button in interface
|
172 |
+
4. Use `--check-loras` to filter incompatible files
|
173 |
+
|
174 |
+
#### Lora Compatibility Issues
|
175 |
+
**Problem**: Lora causes errors or poor results
|
176 |
+
**Solutions**:
|
177 |
+
1. Check model size compatibility (1.3B vs 14B)
|
178 |
+
2. Verify lora was trained for your model type
|
179 |
+
3. Try different multiplier values
|
180 |
+
4. Use `--check-loras` flag to auto-filter
|
181 |
+
|
182 |
+
### VACE-Specific Issues
|
183 |
+
|
184 |
+
#### Poor VACE Results
|
185 |
+
**Problem**: VACE generates poor quality or unexpected results
|
186 |
+
**Solutions**:
|
187 |
+
1. Enable Skip Layer Guidance
|
188 |
+
2. Use detailed prompts describing all elements
|
189 |
+
3. Ensure proper mask creation with Matanyone
|
190 |
+
4. Check reference image quality
|
191 |
+
5. Use at least 15 steps, preferably 30+
|
192 |
+
|
193 |
+
#### Matanyone Tool Issues
|
194 |
+
**Problem**: Mask creation difficulties
|
195 |
+
**Solutions**:
|
196 |
+
1. Use negative point prompts to refine selection
|
197 |
+
2. Create multiple sub-masks and combine them
|
198 |
+
3. Try different background removal options
|
199 |
+
4. Ensure sufficient contrast in source video
|
200 |
+
|
201 |
+
## Network and Server Issues
|
202 |
+
|
203 |
+
### Gradio Interface Problems
|
204 |
+
|
205 |
+
#### Port Already in Use
|
206 |
+
**Problem**: "Port 7860 is already in use"
|
207 |
+
**Solution**:
|
208 |
+
```bash
|
209 |
+
# Use different port
|
210 |
+
python wgp.py --server-port 7861
|
211 |
+
|
212 |
+
# Or kill existing process
|
213 |
+
# Windows
|
214 |
+
netstat -ano | findstr :7860
|
215 |
+
taskkill /PID <PID> /F
|
216 |
+
|
217 |
+
# Linux
|
218 |
+
lsof -i :7860
|
219 |
+
kill <PID>
|
220 |
+
```
|
221 |
+
|
222 |
+
#### Interface Not Loading
|
223 |
+
**Problem**: Browser shows "connection refused"
|
224 |
+
**Solutions**:
|
225 |
+
1. Check if server started successfully
|
226 |
+
2. Try `http://127.0.0.1:7860` instead of `localhost:7860`
|
227 |
+
3. Disable firewall temporarily
|
228 |
+
4. Use `--listen` flag for network access
|
229 |
+
|
230 |
+
### Remote Access Issues
|
231 |
+
|
232 |
+
#### Sharing Not Working
|
233 |
+
**Problem**: `--share` flag doesn't create public URL
|
234 |
+
**Solutions**:
|
235 |
+
1. Check internet connection
|
236 |
+
2. Try different network
|
237 |
+
3. Use `--listen` with port forwarding
|
238 |
+
4. Check firewall settings
|
239 |
+
|
240 |
+
## Quality Issues
|
241 |
+
|
242 |
+
### Poor Video Quality
|
243 |
+
|
244 |
+
#### General Quality Improvements
|
245 |
+
1. Increase number of steps (25-30+)
|
246 |
+
2. Use larger models (14B instead of 1.3B)
|
247 |
+
3. Enable Skip Layer Guidance
|
248 |
+
4. Improve prompt descriptions
|
249 |
+
5. Use higher resolution settings
|
250 |
+
|
251 |
+
#### Specific Quality Issues
|
252 |
+
|
253 |
+
**Blurry Videos**:
|
254 |
+
- Increase steps
|
255 |
+
- Check source image quality (i2v)
|
256 |
+
- Reduce TeaCache multiplier
|
257 |
+
- Use higher guidance scale
|
258 |
+
|
259 |
+
**Inconsistent Motion**:
|
260 |
+
- Use longer overlap in sliding windows
|
261 |
+
- Reduce window size
|
262 |
+
- Improve prompt consistency
|
263 |
+
- Check control video quality (VACE)
|
264 |
+
|
265 |
+
**Color Issues**:
|
266 |
+
- Check model compatibility
|
267 |
+
- Adjust guidance scale
|
268 |
+
- Verify input image color space
|
269 |
+
- Try different VAE settings
|
270 |
+
|
271 |
+
## Advanced Debugging
|
272 |
+
|
273 |
+
### Enable Verbose Output
|
274 |
+
```bash
|
275 |
+
# Maximum verbosity
|
276 |
+
python wgp.py --verbose 2
|
277 |
+
|
278 |
+
# Check lora compatibility
|
279 |
+
python wgp.py --check-loras --verbose 2
|
280 |
+
```
|
281 |
+
|
282 |
+
### Memory Debugging
|
283 |
+
```bash
|
284 |
+
# Monitor GPU memory
|
285 |
+
nvidia-smi -l 1
|
286 |
+
|
287 |
+
# Reduce memory usage
|
288 |
+
python wgp.py --profile 4 --perc-reserved-mem-max 0.2
|
289 |
+
```
|
290 |
+
|
291 |
+
### Performance Profiling
|
292 |
+
```bash
|
293 |
+
# Test different configurations
|
294 |
+
python wgp.py --attention sdpa --profile 4 # Baseline
|
295 |
+
python wgp.py --attention sage --profile 3 # Performance
|
296 |
+
python wgp.py --compile --teacache 2.0 # Maximum speed
|
297 |
+
```
|
298 |
+
|
299 |
+
## Getting Help
|
300 |
+
|
301 |
+
### Before Asking for Help
|
302 |
+
1. Check this troubleshooting guide
|
303 |
+
2. Read the relevant documentation:
|
304 |
+
- [Installation Guide](INSTALLATION.md)
|
305 |
+
- [Getting Started](GETTING_STARTED.md)
|
306 |
+
- [Command Line Reference](CLI.md)
|
307 |
+
3. Try basic fallback configuration:
|
308 |
+
```bash
|
309 |
+
python wgp.py --attention sdpa --profile 4
|
310 |
+
```
|
311 |
+
|
312 |
+
### Community Support
|
313 |
+
- **Discord Server**: https://discord.gg/g7efUW9jGV
|
314 |
+
- Provide relevant information:
|
315 |
+
- GPU model and VRAM amount
|
316 |
+
- Python and PyTorch versions
|
317 |
+
- Complete error messages
|
318 |
+
- Command used to launch WanGP
|
319 |
+
- Operating system
|
320 |
+
|
321 |
+
### Reporting Bugs
|
322 |
+
When reporting issues:
|
323 |
+
1. Include system specifications
|
324 |
+
2. Provide complete error logs
|
325 |
+
3. List the exact steps to reproduce
|
326 |
+
4. Mention any modifications to default settings
|
327 |
+
5. Include command line arguments used
|
328 |
+
|
329 |
+
## Emergency Fallback
|
330 |
+
|
331 |
+
If nothing works, try this minimal configuration:
|
332 |
+
```bash
|
333 |
+
# Absolute minimum setup
|
334 |
+
python wgp.py --t2v-1-3B --attention sdpa --profile 4 --teacache 0 --fp16
|
335 |
+
|
336 |
+
# If that fails, check basic PyTorch installation
|
337 |
+
python -c "import torch; print(torch.cuda.is_available())"
|
338 |
+
```
|
docs/VACE.md
ADDED
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# VACE ControlNet Guide
|
2 |
+
|
3 |
+
VACE is a powerful ControlNet that enables Video-to-Video and Reference-to-Video generation. It allows you to inject your own images into output videos, animate characters, perform inpainting/outpainting, and continue videos.
|
4 |
+
|
5 |
+
## Overview
|
6 |
+
|
7 |
+
VACE is probably one of the most powerful Wan models available. With it, you can:
|
8 |
+
- Inject people or objects into scenes
|
9 |
+
- Animate characters
|
10 |
+
- Perform video inpainting and outpainting
|
11 |
+
- Continue existing videos
|
12 |
+
- Transfer motion from one video to another
|
13 |
+
- Change the style of scenes while preserving depth
|
14 |
+
|
15 |
+
## Getting Started
|
16 |
+
|
17 |
+
### Model Selection
|
18 |
+
1. Select either "Vace 1.3B" or "Vace 13B" from the dropdown menu
|
19 |
+
2. Note: VACE works best with videos up to 7 seconds with the Riflex option enabled
|
20 |
+
|
21 |
+
### Input Types
|
22 |
+
|
23 |
+
VACE accepts three types of visual hints (which can be combined):
|
24 |
+
|
25 |
+
#### 1. Control Video
|
26 |
+
- Transfer motion or depth to a new video
|
27 |
+
- Use only the first n frames and extrapolate the rest
|
28 |
+
- Perform inpainting with grey color (127) as mask areas
|
29 |
+
- Grey areas will be filled based on text prompt and reference images
|
30 |
+
|
31 |
+
#### 2. Reference Images
|
32 |
+
- Use as background/setting for the video
|
33 |
+
- Inject people or objects of your choice
|
34 |
+
- Select multiple reference images
|
35 |
+
- **Tip**: Replace complex backgrounds with white for better object integration
|
36 |
+
- Always describe injected objects/people explicitly in your text prompt
|
37 |
+
|
38 |
+
#### 3. Video Mask
|
39 |
+
- Stronger control over which parts to keep (black) or replace (white)
|
40 |
+
- Perfect for inpainting/outpainting
|
41 |
+
- Example: White mask except at beginning/end (black) keeps first/last frames while generating middle content
|
42 |
+
|
43 |
+
## Common Use Cases
|
44 |
+
|
45 |
+
### Motion Transfer
|
46 |
+
**Goal**: Animate a character of your choice using motion from another video
|
47 |
+
**Setup**:
|
48 |
+
- Reference Images: Your character
|
49 |
+
- Control Video: Person performing desired motion
|
50 |
+
- Text Prompt: Describe your character and the action
|
51 |
+
|
52 |
+
### Object/Person Injection
|
53 |
+
**Goal**: Insert people or objects into a scene
|
54 |
+
**Setup**:
|
55 |
+
- Reference Images: The people/objects to inject
|
56 |
+
- Text Prompt: Describe the scene and explicitly mention the injected elements
|
57 |
+
|
58 |
+
### Character Animation
|
59 |
+
**Goal**: Animate a character based on text description
|
60 |
+
**Setup**:
|
61 |
+
- Control Video: Video of person moving
|
62 |
+
- Text Prompt: Detailed description of your character
|
63 |
+
|
64 |
+
### Style Transfer with Depth
|
65 |
+
**Goal**: Change scene style while preserving spatial relationships
|
66 |
+
**Setup**:
|
67 |
+
- Control Video: Original video (for depth information)
|
68 |
+
- Text Prompt: New style description
|
69 |
+
|
70 |
+
## Integrated Matanyone Tool
|
71 |
+
|
72 |
+
WanGP includes the Matanyone tool, specifically tuned for VACE workflows. This helps create control videos and masks simultaneously.
|
73 |
+
|
74 |
+
### Creating Face Replacement Masks
|
75 |
+
1. Load your video in Matanyone
|
76 |
+
2. Click on the face in the first frame
|
77 |
+
3. Create a mask for the face
|
78 |
+
4. Generate both control video and mask video with "Generate Video Matting"
|
79 |
+
5. Export to VACE with "Export to current Video Input and Video Mask"
|
80 |
+
6. Load replacement face image in Reference Images field
|
81 |
+
|
82 |
+
### Advanced Matanyone Tips
|
83 |
+
- **Negative Point Prompts**: Remove parts from current selection
|
84 |
+
- **Sub Masks**: Create multiple independent masks, then combine them
|
85 |
+
- **Background Masks**: Select everything except the character (useful for background replacement)
|
86 |
+
- Enable/disable sub masks in Matanyone settings
|
87 |
+
|
88 |
+
## Recommended Settings
|
89 |
+
|
90 |
+
### Quality Settings
|
91 |
+
- **Skip Layer Guidance**: Turn ON with default configuration for better results
|
92 |
+
- **Long Prompts**: Use detailed descriptions, especially for background elements not in reference images
|
93 |
+
- **Steps**: Use at least 15 steps for good quality, 30+ for best results
|
94 |
+
|
95 |
+
### Sliding Window Settings
|
96 |
+
For very long videos, configure sliding windows properly:
|
97 |
+
|
98 |
+
- **Window Size**: Set appropriate duration for your content
|
99 |
+
- **Overlap Frames**: Long enough for motion continuity, short enough to avoid blur propagation
|
100 |
+
- **Discard Last Frames**: Remove at least 4 frames from each window (VACE 1.3B tends to blur final frames)
|
101 |
+
|
102 |
+
### Background Removal
|
103 |
+
VACE includes automatic background removal options:
|
104 |
+
- Use for reference images containing people/objects
|
105 |
+
- **Don't use** for landscape/setting reference images (first reference image)
|
106 |
+
- Multiple background removal types available
|
107 |
+
|
108 |
+
## Window Sliding for Long Videos
|
109 |
+
|
110 |
+
Generate videos up to 1 minute by merging multiple windows:
|
111 |
+
|
112 |
+
### How It Works
|
113 |
+
- Each window uses corresponding time segment from control video
|
114 |
+
- Example: 0-4s control video → first window, 4-8s → second window, etc.
|
115 |
+
- Automatic overlap management ensures smooth transitions
|
116 |
+
|
117 |
+
### Settings
|
118 |
+
- **Window Size**: Duration of each generation window
|
119 |
+
- **Overlap Frames**: Frames shared between windows for continuity
|
120 |
+
- **Discard Last Frames**: Remove poor-quality ending frames
|
121 |
+
- **Add Overlapped Noise**: Reduce quality degradation over time
|
122 |
+
|
123 |
+
### Formula
|
124 |
+
```
|
125 |
+
Generated Frames = [Windows - 1] × [Window Size - Overlap - Discard] + Window Size
|
126 |
+
```
|
127 |
+
|
128 |
+
### Multi-Line Prompts (Experimental)
|
129 |
+
- Each line of prompt used for different window
|
130 |
+
- If more windows than prompt lines, last line repeats
|
131 |
+
- Separate lines with carriage return
|
132 |
+
|
133 |
+
## Advanced Features
|
134 |
+
|
135 |
+
### Extend Video
|
136 |
+
Click "Extend the Video Sample, Please!" during generation to add more windows dynamically.
|
137 |
+
|
138 |
+
### Noise Addition
|
139 |
+
Add noise to overlapped frames to hide accumulated errors and quality degradation.
|
140 |
+
|
141 |
+
### Frame Truncation
|
142 |
+
Automatically remove lower-quality final frames from each window (recommended: 4 frames for VACE 1.3B).
|
143 |
+
|
144 |
+
## External Resources
|
145 |
+
|
146 |
+
### Official VACE Resources
|
147 |
+
- **GitHub**: https://github.com/ali-vilab/VACE/tree/main/vace/gradios
|
148 |
+
- **User Guide**: https://github.com/ali-vilab/VACE/blob/main/UserGuide.md
|
149 |
+
- **Preprocessors**: Gradio tools for preparing materials
|
150 |
+
|
151 |
+
### Recommended External Tools
|
152 |
+
- **Annotation Tools**: For creating precise masks
|
153 |
+
- **Video Editors**: For preparing control videos
|
154 |
+
- **Background Removal**: For cleaning reference images
|
155 |
+
|
156 |
+
## Troubleshooting
|
157 |
+
|
158 |
+
### Poor Quality Results
|
159 |
+
1. Use longer, more detailed prompts
|
160 |
+
2. Enable Skip Layer Guidance
|
161 |
+
3. Increase number of steps (30+)
|
162 |
+
4. Check reference image quality
|
163 |
+
5. Ensure proper mask creation
|
164 |
+
|
165 |
+
### Inconsistent Windows
|
166 |
+
1. Increase overlap frames
|
167 |
+
2. Use consistent prompting across windows
|
168 |
+
3. Add noise to overlapped frames
|
169 |
+
4. Reduce discard frames if losing too much content
|
170 |
+
|
171 |
+
### Memory Issues
|
172 |
+
1. Use VACE 1.3B instead of 13B
|
173 |
+
2. Reduce video length or resolution
|
174 |
+
3. Decrease window size
|
175 |
+
4. Enable quantization
|
176 |
+
|
177 |
+
### Blurry Results
|
178 |
+
1. Reduce overlap frames
|
179 |
+
2. Increase discard last frames
|
180 |
+
3. Use higher resolution reference images
|
181 |
+
4. Check control video quality
|
182 |
+
|
183 |
+
## Tips for Best Results
|
184 |
+
|
185 |
+
1. **Detailed Prompts**: Describe everything in the scene, especially elements not in reference images
|
186 |
+
2. **Quality Reference Images**: Use high-resolution, well-lit reference images
|
187 |
+
3. **Proper Masking**: Take time to create precise masks with Matanyone
|
188 |
+
4. **Iterative Approach**: Start with short videos, then extend successful results
|
189 |
+
5. **Background Preparation**: Remove complex backgrounds from object/person reference images
|
190 |
+
6. **Consistent Lighting**: Match lighting between reference images and intended scene
|
hyvideo/diffusion/pipelines/pipeline_hunyuan_video_audio.py
CHANGED
@@ -778,7 +778,7 @@ class HunyuanVideoAudioPipeline(DiffusionPipeline):
|
|
778 |
# uncond_ref_latents: Union[torch.Tensor],
|
779 |
pixel_value_llava: Union[torch.Tensor], # [1, 3, 336, 336]
|
780 |
uncond_pixel_value_llava: Union[torch.Tensor],
|
781 |
-
pixel_value_ref,
|
782 |
face_masks: Union[torch.Tensor], # [b f h w]
|
783 |
audio_prompts: Union[torch.Tensor],
|
784 |
uncond_audio_prompts: Union[torch.Tensor],
|
|
|
778 |
# uncond_ref_latents: Union[torch.Tensor],
|
779 |
pixel_value_llava: Union[torch.Tensor], # [1, 3, 336, 336]
|
780 |
uncond_pixel_value_llava: Union[torch.Tensor],
|
781 |
+
pixel_value_ref: Union[torch.Tensor],
|
782 |
face_masks: Union[torch.Tensor], # [b f h w]
|
783 |
audio_prompts: Union[torch.Tensor],
|
784 |
uncond_audio_prompts: Union[torch.Tensor],
|
wgp.py
CHANGED
@@ -43,7 +43,7 @@ AUTOSAVE_FILENAME = "queue.zip"
|
|
43 |
PROMPT_VARS_MAX = 10
|
44 |
|
45 |
target_mmgp_version = "3.4.7"
|
46 |
-
WanGP_version = "5.
|
47 |
prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
|
48 |
|
49 |
from importlib.metadata import version
|
@@ -1255,6 +1255,14 @@ def _parse_args():
|
|
1255 |
help="default denoising steps"
|
1256 |
)
|
1257 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1258 |
parser.add_argument(
|
1259 |
"--frames",
|
1260 |
type=int,
|
@@ -1776,6 +1784,7 @@ def get_default_settings(filename):
|
|
1776 |
"flow_shift": 5,
|
1777 |
"tea_cache_start_step_perc": 25,
|
1778 |
"video_length": 129,
|
|
|
1779 |
})
|
1780 |
elif get_model_type(filename) in ("vace_14B"):
|
1781 |
ui_defaults.update({
|
@@ -2874,7 +2883,9 @@ def generate_video(
|
|
2874 |
send_cmd("output")
|
2875 |
|
2876 |
joint_pass = boost ==1 #and profile != 1 and profile != 3
|
2877 |
-
|
|
|
|
|
2878 |
trans.enable_teacache = tea_cache_setting > 0
|
2879 |
if trans.enable_teacache:
|
2880 |
trans.teacache_multiplier = tea_cache_setting
|
@@ -3555,7 +3566,10 @@ def generate_preview(latents):
|
|
3555 |
|
3556 |
images = torch.nn.functional.conv3d(latents, weight, bias=bias, stride=1, padding=0, dilation=1, groups=1)
|
3557 |
images = images.add_(1.0).mul_(127.5)
|
3558 |
-
images = images.detach().cpu()
|
|
|
|
|
|
|
3559 |
images = einops.rearrange(images, 'b c t h w -> (b h) (t w) c')
|
3560 |
h, w, _ = images.shape
|
3561 |
scale = 200 / h
|
@@ -5516,10 +5530,13 @@ def generate_about_tab():
|
|
5516 |
gr.Markdown("Many thanks to:")
|
5517 |
gr.Markdown("- <B>Alibaba Wan team for the best open source video generator")
|
5518 |
gr.Markdown("- <B>Alibaba Vace and Fun Teams for their incredible control net models")
|
|
|
|
|
5519 |
gr.Markdown("- <B>Cocktail Peanuts</B> : QA and simple installation via Pinokio.computer")
|
5520 |
gr.Markdown("- <B>Tophness</B> : created (former) multi tabs and queuing frameworks")
|
5521 |
gr.Markdown("- <B>AmericanPresidentJimmyCarter</B> : added original support for Skip Layer Guidance")
|
5522 |
gr.Markdown("- <B>Remade_AI</B> : for their awesome Loras collection")
|
|
|
5523 |
gr.Markdown("<BR>Huge acknowlegments to these great open source projects used in WanGP:")
|
5524 |
gr.Markdown("- <B>Rife</B>: temporal upsampler (https://github.com/hzwer/ECCV2022-RIFE)")
|
5525 |
gr.Markdown("- <B>DwPose</B>: Open Pose extractor (https://github.com/IDEA-Research/DWPose)")
|
@@ -5528,14 +5545,25 @@ def generate_about_tab():
|
|
5528 |
|
5529 |
|
5530 |
def generate_info_tab():
|
5531 |
-
|
5532 |
-
|
5533 |
-
|
5534 |
-
|
5535 |
-
|
5536 |
-
|
5537 |
-
|
5538 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5539 |
|
5540 |
|
5541 |
def generate_dropdown_model_list(model_filename):
|
@@ -5942,7 +5970,7 @@ def create_ui():
|
|
5942 |
( state, loras_choices, lset_name, state,
|
5943 |
video_guide, video_mask, image_refs, video_prompt_type_video_trigger, prompt_enhancer_row
|
5944 |
) = generate_video_tab(model_choice=model_choice, header=header, main = main)
|
5945 |
-
with gr.Tab("
|
5946 |
generate_info_tab()
|
5947 |
with gr.Tab("Video Mask Creator", id="video_mask_creator") as video_mask_creator:
|
5948 |
from preprocessing.matanyone import app as matanyone_app
|
|
|
43 |
PROMPT_VARS_MAX = 10
|
44 |
|
45 |
target_mmgp_version = "3.4.7"
|
46 |
+
WanGP_version = "5.4"
|
47 |
prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
|
48 |
|
49 |
from importlib.metadata import version
|
|
|
1255 |
help="default denoising steps"
|
1256 |
)
|
1257 |
|
1258 |
+
|
1259 |
+
parser.add_argument(
|
1260 |
+
"--teacache",
|
1261 |
+
type=float,
|
1262 |
+
default=-1,
|
1263 |
+
help="teacache speed multiplier"
|
1264 |
+
)
|
1265 |
+
|
1266 |
parser.add_argument(
|
1267 |
"--frames",
|
1268 |
type=int,
|
|
|
1784 |
"flow_shift": 5,
|
1785 |
"tea_cache_start_step_perc": 25,
|
1786 |
"video_length": 129,
|
1787 |
+
"video_prompt_type": "I",
|
1788 |
})
|
1789 |
elif get_model_type(filename) in ("vace_14B"):
|
1790 |
ui_defaults.update({
|
|
|
2883 |
send_cmd("output")
|
2884 |
|
2885 |
joint_pass = boost ==1 #and profile != 1 and profile != 3
|
2886 |
+
# TeaCache
|
2887 |
+
if args.teacache > 0:
|
2888 |
+
tea_cache_setting = args.teacache
|
2889 |
trans.enable_teacache = tea_cache_setting > 0
|
2890 |
if trans.enable_teacache:
|
2891 |
trans.teacache_multiplier = tea_cache_setting
|
|
|
3566 |
|
3567 |
images = torch.nn.functional.conv3d(latents, weight, bias=bias, stride=1, padding=0, dilation=1, groups=1)
|
3568 |
images = images.add_(1.0).mul_(127.5)
|
3569 |
+
images = images.detach().cpu()
|
3570 |
+
if images.dtype == torch.bfloat16:
|
3571 |
+
images = images.to(torch.float16)
|
3572 |
+
images = images.numpy().clip(0, 255).astype(np.uint8)
|
3573 |
images = einops.rearrange(images, 'b c t h w -> (b h) (t w) c')
|
3574 |
h, w, _ = images.shape
|
3575 |
scale = 200 / h
|
|
|
5530 |
gr.Markdown("Many thanks to:")
|
5531 |
gr.Markdown("- <B>Alibaba Wan team for the best open source video generator")
|
5532 |
gr.Markdown("- <B>Alibaba Vace and Fun Teams for their incredible control net models")
|
5533 |
+
gr.Markdown("- <B>Tencent for the impressive Hunyuan Video models")
|
5534 |
+
gr.Markdown("- <B>Lightricks for the super fast LTX Video models")
|
5535 |
gr.Markdown("- <B>Cocktail Peanuts</B> : QA and simple installation via Pinokio.computer")
|
5536 |
gr.Markdown("- <B>Tophness</B> : created (former) multi tabs and queuing frameworks")
|
5537 |
gr.Markdown("- <B>AmericanPresidentJimmyCarter</B> : added original support for Skip Layer Guidance")
|
5538 |
gr.Markdown("- <B>Remade_AI</B> : for their awesome Loras collection")
|
5539 |
+
gr.Markdown("- <B>Reevoy24</B> : for his repackaging / completing the documentation")
|
5540 |
gr.Markdown("<BR>Huge acknowlegments to these great open source projects used in WanGP:")
|
5541 |
gr.Markdown("- <B>Rife</B>: temporal upsampler (https://github.com/hzwer/ECCV2022-RIFE)")
|
5542 |
gr.Markdown("- <B>DwPose</B>: Open Pose extractor (https://github.com/IDEA-Research/DWPose)")
|
|
|
5545 |
|
5546 |
|
5547 |
def generate_info_tab():
|
5548 |
+
|
5549 |
+
|
5550 |
+
with open("docs/VACE.md", "r", encoding="utf-8") as reader:
|
5551 |
+
vace= reader.read()
|
5552 |
+
|
5553 |
+
with open("docs/MODELS.md", "r", encoding="utf-8") as reader:
|
5554 |
+
models = reader.read()
|
5555 |
+
|
5556 |
+
with open("docs/LORAS.md", "r", encoding="utf-8") as reader:
|
5557 |
+
loras = reader.read()
|
5558 |
+
|
5559 |
+
with gr.Tabs() :
|
5560 |
+
with gr.Tab("Models", id="models"):
|
5561 |
+
gr.Markdown(models)
|
5562 |
+
with gr.Tab("Loras", id="loras"):
|
5563 |
+
gr.Markdown(loras)
|
5564 |
+
with gr.Tab("Vace", id="vace"):
|
5565 |
+
gr.Markdown(vace)
|
5566 |
+
|
5567 |
|
5568 |
|
5569 |
def generate_dropdown_model_list(model_filename):
|
|
|
5970 |
( state, loras_choices, lset_name, state,
|
5971 |
video_guide, video_mask, image_refs, video_prompt_type_video_trigger, prompt_enhancer_row
|
5972 |
) = generate_video_tab(model_choice=model_choice, header=header, main = main)
|
5973 |
+
with gr.Tab("Guides", id="info"):
|
5974 |
generate_info_tab()
|
5975 |
with gr.Tab("Video Mask Creator", id="video_mask_creator") as video_mask_creator:
|
5976 |
from preprocessing.matanyone import app as matanyone_app
|