Reevoy commited on
Commit
aefd35d
·
1 Parent(s): 896a1b4

better Readme structure

Browse files
Files changed (9) hide show
  1. README.md +54 -404
  2. docs/CHANGELOG.md +148 -0
  3. docs/CLI.md +242 -0
  4. docs/GETTING_STARTED.md +194 -0
  5. docs/INSTALLATION.md +170 -0
  6. docs/LORAS.md +197 -0
  7. docs/MODELS.md +232 -0
  8. docs/TROUBLESHOOTING.md +338 -0
  9. docs/VACE.md +190 -0
README.md CHANGED
@@ -15,441 +15,91 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
15
  - Loras Support to customize each model
16
  - Queuing system : make your shopping list of videos to generate and come back later
17
 
18
-
19
  **Discord Server to get Help from Other Users and show your Best Videos:** https://discord.gg/g7efUW9jGV
20
 
 
21
 
 
 
 
 
 
22
 
23
- ## 🔥 Latest News!!
24
- * May 26 2025: 👋 Wan 2.1GP v5.3 : Happy with a Video generation and want to do more generations using the same settings but you can't remember what you did or you find it to hard to copy / paste one per one each setting from the file metadata ? Rejoice ! There are now multiple ways to turn this tedious process into a one click task:
25
- - Select one Video recently generated in the Video Gallery and click *Use Selected Video Settings*
26
- - Click *Drop File Here* and select a Video you saved somewhere, if the settings metadata have been saved with the Video you will be able to extract them automatically
27
- - Click *Export Settings to File* to save on your harddrive the current settings. You will be able to use them later again by clicking *Drop File Here* and select this time a Settings json file
28
- * May 23 2025: 👋 Wan 2.1GP v5.21 : Improvements for Vace: better transitions between Sliding Windows,Support for Image masks in Matanyone, new Extend Video for Vace, different types of automated background removal
29
- * May 20 2025: 👋 Wan 2.1GP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
30
- The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
31
- See instructions below on how to use CausVid.\
32
- Also as an experiment I have added support for the MoviiGen, the first model that claims to capable to generate 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server.
33
- * May 18 2025: 👋 Wan 2.1GP v5.1 : Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos !
34
- * May 17 2025: 👋 Wan 2.1GP v5.0 : One App to Rule Them All !\
35
- Added support for the other great open source architectures:
36
- - Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
37
- - LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced LTX Video VRAM requirements by 4 !
38
-
39
- Also:
40
- - Added supported for the best Control Video Model, released 2 days ago : Vace 14B
41
- - New Integrated prompt enhancer to increase the quality of the generated videos
42
- You will need one more *pip install -r requirements.txt*
43
-
44
- * May 5 2025: 👋 Wan 2.1GP v4.5: FantasySpeaking model, you can animate a talking head using a voice track. This works not only on people but also on objects. Also better seamless transitions between Vace sliding windows for very long videos (see recommended settings). New high quality processing features (mixed 16/32 bits calculation and 32 bitsVAE)
45
- * April 27 2025: 👋 Wan 2.1GP v4.4: Phantom model support, very good model to transfer people or objects into video, works quite well at 720p and with the number of steps > 30
46
- * April 25 2025: 👋 Wan 2.1GP v4.3: Added preview mode and support for Sky Reels v2 Diffusion Forcing for high quality "infinite length videos" (see Window Sliding section below).Note that Skyreel uses causal attention that is only supported by Sdpa attention so even if chose an other type of attention, some of the processes will use Sdpa attention.
47
-
48
- * April 18 2025: 👋 Wan 2.1GP v4.2: FLF2V model support, official support from Wan for image2video start and end frames specialized for 720p.
49
- * April 17 2025: 👋 Wan 2.1GP v4.1: Recam Master model support, view a video from a different angle. The video to process must be at least 81 frames long and you should set at least 15 steps denoising to get good results.
50
- * April 13 2025: 👋 Wan 2.1GP v4.0: lots of goodies for you !
51
- - A new UI, tabs were replaced by a Dropdown box to easily switch models
52
- - A new queuing system that lets you stack in a queue as many text2video, imag2video tasks, ... as you want. Each task can rely on complete different generation parameters (different number of frames, steps, loras, ...). Many thanks to **Tophness** for being a big contributor on this new feature
53
- - Temporal upsampling (Rife) and spatial upsampling (Lanczos) for a smoother video (32 fps or 64 fps) and to enlarge your video by x2 or x4. Check these new advanced options.
54
- - Wan Vace Control Net support : with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ... I have provided an introduction guide below.
55
- - Integrated *Matanyone* tool directly inside WanGP so that you can create easily inpainting masks used in Vace
56
- - Sliding Window generation for Vace, create windows that can last dozen of seconds
57
- - New optimisations for old generation GPUs: Generate 5s (81 frames, 15 steps) of Vace 1.3B with only 5GB and in only 6 minutes on a RTX 2080Ti and 5s of t2v 14B in less than 10 minutes.
58
-
59
- * Mar 27 2025: 👋 Added support for the new Wan Fun InP models (image2video). The 14B Fun InP has probably better end image support but unfortunately existing loras do not work so well with it. The great novelty is the Fun InP image2 1.3B model : Image 2 Video is now accessible to even lower hardware configuration. It is not as good as the 14B models but very impressive for its size. You can choose any of those models in the Configuration tab. Many thanks to the VideoX-Fun team (https://github.com/aigc-apps/VideoX-Fun)
60
- * Mar 26 2025: 👋 Good news ! Official support for RTX 50xx please check the installation instructions below.
61
- * Mar 24 2025: 👋 Wan2.1GP v3.2:
62
- - Added Classifier-Free Guidance Zero Star. The video should match better the text prompt (especially with text2video) at no performance cost: many thanks to the **CFG Zero * Team:**\
63
- Dont hesitate to give them a star if you appreciate the results: https://github.com/WeichenFan/CFG-Zero-star
64
- - Added back support for Pytorch compilation with Loras. It seems it had been broken for some time
65
- - Added possibility to keep a number of pregenerated videos in the Video Gallery (useful to compare outputs of different settings)
66
- You will need one more *pip install -r requirements.txt*
67
- * Mar 19 2025: 👋 Wan2.1GP v3.1: Faster launch and RAM optimizations (should require less RAM to run)\
68
- You will need one more *pip install -r requirements.txt*
69
- * Mar 18 2025: 👋 Wan2.1GP v3.0:
70
- - New Tab based interface, yon can switch from i2v to t2v conversely without restarting the app
71
- - Experimental Dual Frames mode for i2v, you can also specify an End frame. It doesn't always work, so you will need a few attempts.
72
- - You can save default settings in the files *i2v_settings.json* and *t2v_settings.json* that will be used when launching the app (you can also specify the path to different settings files)
73
- - Slight acceleration with loras\
74
- You will need one more *pip install -r requirements.txt*
75
- Many thanks to *Tophness* who created the framework (and did a big part of the work) of the multitabs and saved settings features
76
- * Mar 18 2025: 👋 Wan2.1GP v2.11: Added more command line parameters to prefill the generation settings + customizable output directory and choice of type of metadata for generated videos. Many thanks to *Tophness* for his contributions. You will need one more *pip install -r requirements.txt* to reflect new dependencies\
77
- * Mar 18 2025: 👋 Wan2.1GP v2.1: More Loras !: added support for 'Safetensors' and 'Replicate' Lora formats.\
78
- You will need to refresh the requirements with a *pip install -r requirements.txt*
79
- * Mar 17 2025: 👋 Wan2.1GP v2.0: The Lora festival continues:
80
- - Clearer user interface
81
- - Download 30 Loras in one click to try them all (expand the info section)
82
- - Very to use Loras as now Lora presets can input the subject (or other need terms) of the Lora so that you dont have to modify manually a prompt
83
- - Added basic macro prompt language to prefill prompts with differnent values. With one prompt template, you can generate multiple prompts.
84
- - New Multiple images prompts: you can now combine any number of images with any number of text promtps (need to launch the app with --multiple-images)
85
- - New command lines options to launch directly the 1.3B t2v model or the 14B t2v model
86
- * Mar 14, 2025: 👋 Wan2.1GP v1.7:
87
- - Lora Fest special edition: very fast loading / unload of loras for those Loras collectors around. You can also now add / remove loras in the Lora folder without restarting the app. You will need to refresh the requirements *pip install -r requirements.txt*
88
- - Added experimental Skip Layer Guidance (advanced settings), that should improve the image quality at no extra cost. Many thanks to the *AmericanPresidentJimmyCarter* for the original implementation
89
- * Mar 13, 2025: 👋 Wan2.1GP v1.6: Better Loras support, accelerated loading Loras. You will need to refresh the requirements *pip install -r requirements.txt*
90
- * Mar 10, 2025: 👋 Wan2.1GP v1.5: Official Teacache support + Smart Teacache (find automatically best parameters for a requested speed multiplier), 10% speed boost with no quality loss, improved lora presets (they can now include prompts and comments to guide the user)
91
- * Mar 07, 2025: 👋 Wan2.1GP v1.4: Fix Pytorch compilation, now it is really 20% faster when activated
92
- * Mar 04, 2025: 👋 Wan2.1GP v1.3: Support for Image to Video with multiples images for different images / prompts combinations (requires *--multiple-images* switch), and added command line *--preload x* to preload in VRAM x MB of the main diffusion model if you find there is too much unused VRAM and you want to (slightly) accelerate the generation process.
93
- If you upgrade you will need to do a 'pip install -r requirements.txt' again.
94
- * Mar 04, 2025: 👋 Wan2.1GP v1.2: Implemented tiling on VAE encoding and decoding. No more VRAM peaks at the beginning and at the end
95
- * Mar 03, 2025: 👋 Wan2.1GP v1.1: added Tea Cache support for faster generations: optimization of kijai's implementation (https://github.com/kijai/ComfyUI-WanVideoWrapper/) of teacache (https://github.com/ali-vilab/TeaCache)
96
- * Mar 02, 2025: 👋 Wan2.1GP by DeepBeepMeep v1 brings:
97
- - Support for all Wan including the Image to Video model
98
- - Reduced memory consumption by 2, with possiblity to generate more than 10s of video at 720p with a RTX 4090 and 10s of video at 480p with less than 12GB of VRAM. Many thanks to REFLEx (https://github.com/thu-ml/RIFLEx) for their algorithm that allows generating nice looking video longer than 5s.
99
- - The usual perks: web interface, multiple generations, loras support, sage attebtion, auto download of models, ...
100
-
101
- * Feb 25, 2025: 👋 We've released the inference code and weights of Wan2.1.
102
- * Feb 27, 2025: 👋 Wan2.1 has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
103
-
104
-
105
-
106
- ## Installation Guide for Linux and Windows for GPUs up to RTX40xx
107
-
108
- **If you are looking for a one click installation, just go to the Pinokio App store : https://pinokio.computer/**\
109
- Otherwise you will find the instructions below:
110
-
111
- This app has been tested on Python 3.10 / 2.6.0 / Cuda 12.4.
112
-
113
- ```shell
114
- # 0 Download the source and create a Python 3.10.9 environment using conda or create a venv using python
115
- git clone https://github.com/deepbeepmeep/Wan2GP.git
116
- cd Wan2GP
117
- conda create -n wan2gp python=3.10.9
118
- conda activate wan2gp
119
 
120
- # 1 Install pytorch 2.6.0
121
- pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
122
 
123
- # 2. Install pip dependencies
124
- pip install -r requirements.txt
125
 
126
- # 3.1 optional Sage attention support (30% faster)
127
- # Windows only: extra step only needed for windows as triton is included in pytorch with the Linux version of pytorch
128
- pip install triton-windows
129
- # For both Windows and Linux
130
- pip install sageattention==1.0.6
131
 
 
132
 
133
- # 3.2 optional Sage 2 attention support (40% faster)
134
- # Windows only
135
- pip install triton-windows
136
- pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl
137
- # Linux only (sorry only manual compilation for the moment, but is straight forward with Linux)
138
- git clone https://github.com/thu-ml/SageAttention
139
- cd SageAttention
140
- pip install -e .
141
 
142
- # 3.3 optional Flash attention support (easy to install on Linux but may be complex on Windows as it will try to compile the cuda kernels)
143
- pip install flash-attn==2.7.2.post1
 
 
 
144
 
145
- ```
146
 
147
- Note pytorch *sdpa attention* is available by default. It is worth installing *Sage attention* (albout not as simple as it sounds) because it offers a 30% speed boost over *sdpa attention* at a small quality cost.
148
- In order to install Sage, you will need to install also Triton. If Triton is installed you can turn on *Pytorch Compilation* which will give you an additional 20% speed boost and reduced VRAM consumption.
149
 
150
- ## Installation Guide for Linux and Windows for GPUs up to RTX50xx
151
- RTX50XX are only supported by pytorch starting from pytorch 2.7.0 which is still in beta. Therefore this version may be less stable.\
152
- It is important to use Python 3.10 otherwise the pip wheels may not be compatible.
153
- ```
154
- # 0 Download the source and create a Python 3.10.9 environment using conda or create a venv using python
155
  git clone https://github.com/deepbeepmeep/Wan2GP.git
156
  cd Wan2GP
157
  conda create -n wan2gp python=3.10.9
158
  conda activate wan2gp
159
-
160
- # 1 Install pytorch 2.7.0:
161
- pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
162
-
163
- # 2. Install pip dependencies
164
  pip install -r requirements.txt
165
-
166
- # 3.1 optional Sage attention support (30% faster)
167
- # Windows only: extra step only needed for windows as triton is included in pytorch with the Linux version of pytorch
168
- pip install triton-windows
169
- # For both Windows and Linux
170
- pip install sageattention==1.0.6
171
-
172
-
173
- # 3.2 optional Sage 2 attention support (40% faster)
174
- # Windows only
175
- pip install triton-windows
176
- pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl
177
-
178
- # Linux only (sorry only manual compilation for the moment, but is straight forward with Linux)
179
- git clone https://github.com/thu-ml/SageAttention
180
- cd SageAttention
181
- pip install -e .
182
- ```
183
-
184
- ## Run the application
185
-
186
- ### Run a Gradio Server on port 7860 (recommended)
187
-
188
- To run the text to video generator (in Low VRAM mode):
189
- ```bash
190
- python wgp.py
191
- #or
192
- python wgp.py --t2v #launch the default Wan text 2 video model
193
- #or
194
- python wgp.py --t2v-14B #for the Wan 14B model
195
- #or
196
- python wgp.py --t2v-1-3B #for the Wan 1.3B model
197
-
198
- ```
199
-
200
- To run the image to video generator (in Low VRAM mode):
201
- ```bash
202
- python wgp.py --i2v
203
- ```
204
- To run the 1.3B Fun InP image to video generator (in Low VRAM mode):
205
- ```bash
206
- python wgp.py --i2v-1-3B
207
- ```
208
-
209
- To be able to input multiple images with the image to video generator:
210
- ```bash
211
- python wgp.py --i2v --multiple-images
212
- ```
213
-
214
- Within the application you can configure which video generator will be launched without specifying a command line switch.
215
-
216
- To run the application while loading entirely the diffusion model in VRAM (slightly faster but requires 24 GB of VRAM for a 8 bits quantized 14B model )
217
- ```bash
218
- python wgp.py --profile 3
219
  ```
220
 
221
- **Trouble shooting**:\
222
- If you have installed Sage attention, it may seem that it works because *pip install sageattention* didn't produce and error or because sage is offered as on option but in fact it doesnt work : in order to be fully operatioal Sage needs to compile its triton kernels the first time it is run (that is the first time you try to generate a video).
223
-
224
- Sometime fixing Sage compilation is easy (clear the triton cache, check triton is properly installed) sometime it is simply not possible because Sage is not supported on some older GPUs
225
-
226
- Therefore you may have no choice but to fallback to sdpa attention, to do so:
227
- - In the configuration menu inside the application, switch "Attention mode" to "sdpa"
228
- or
229
- - Launch the application this way:
230
  ```bash
231
- python wgp.py --attention sdpa
 
232
  ```
233
 
234
- ### Loras support
235
-
236
-
237
- Lora for the Wan models are stored in the subfoler 'loras' for t2v and 'loras_i2v'. You will be then able to activate / desactive any of them when running the application by selecting them in the Advanced Tab "Loras" .
238
-
239
- If you want to manage in different areas Loras for the 1.3B model and the 14B of Wan t2v models (as they are not compatible), just create the following subfolders:
240
- - loras/1.3B
241
- - loras/14B
242
 
243
- You can also put all the loras in the same place by launching the app with the following command line (*path* is a path to shared loras directory):
244
- ```
245
- python wgp.exe --lora-dir path --lora-dir-i2v path
246
- ```
247
 
248
- Hunyuan Video and LTX Video models have also their own loras subfolders:
249
- -loras_hunyuan
250
- -loras_hunyuan_i2v
251
- -loras_ltxv
252
 
 
 
 
253
 
254
- For each activated Lora, you may specify a *multiplier* that is one float number that corresponds to its weight (default is 1.0) .The multipliers for each Lora should be separated by a space character or a carriage return. For instance:\
255
- *1.2 0.8* means that the first lora will have a 1.2 multiplier and the second one will have 0.8.
 
 
256
 
257
- Alternatively for each Lora's multiplier you may specify a list of float numbers multipliers separated by a "," (no space) that gives the evolution of this Lora's multiplier over the steps. For instance let's assume there are 30 denoising steps and the multiplier is *0.9,0.8,0.7* then for the steps ranges 0-9, 10-19 and 20-29 the Lora multiplier will be respectively 0.9, 0.8 and 0.7.
258
 
259
- If multiple Loras are defined, remember that each multiplier associated to different Loras should be separated by a space or a carriage return, so we can specify the evolution of multipliers for multiple Loras. For instance for two Loras (press Shift Return to force a carriage return):
 
260
 
261
- ```
262
- 0.9,0.8,0.7
263
- 1.2,1.1,1.0
264
- ```
265
- You can edit, save or delete Loras presets (combinations of loras with their corresponding multipliers) directly from the gradio Web interface. These presets will save the *comment* part of the prompt that should contain some instructions how to use the corresponding the loras (for instance by specifying a trigger word or providing an example).A comment in the prompt is a line that starts that a #. It will be ignored by the video generator. For instance:
266
-
267
- ```
268
- # use they keyword ohnvx to trigger the Lora*
269
- A ohnvx is driving a car
270
- ```
271
- Each preset, is a file with ".lset" extension stored in the loras directory and can be shared with other users
272
 
273
- Last but not least you can pre activate Loras corresponding and prefill a prompt (comments only or full prompt) by specifying a preset when launching the gradio server:
274
- ```bash
275
- python wgp.py --lora-preset mylorapreset.lset # where 'mylorapreset.lset' is a preset stored in the 'loras' folder
276
- ```
277
-
278
- You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
279
-
280
- ### CausVid Lora
281
-
282
- Wan CausVid is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. Also as a distilled model it doesnt require CFG and is two times faster for the same number of steps.
283
- The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B to accelerate other models too. It is possible it works also with Wan i2v models.
284
-
285
- Instructions:
286
- 1) Download first the Lora: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
287
- 2) Choose a Wan t2v model (for instance Wan 2.1 text2video 13B or Vace 13B )
288
- 3) Turn on the Advanced Mode by checking the corresponding checkbox
289
- 4) In the Advanced Generation Tab: select Guidance Scale =1, Shift Scale = 7
290
- 5) In the Advanced Lora Tab : Select the CausVid Lora (click the Refresh button at the top if you dont see it), and enter 0.3 as Lora multiplier
291
- 6) Now select a 12 steps generation and Click Generate
292
-
293
- You can reduce the number of steps to as low as 4 but you will need to increase progressively at the same time the Lora muliplier up to 1. Please note the lower the number of steps the lower the quality (especially the motion).
294
-
295
- You can combine the CausVid Lora and other Loras (just follow the instructions above)
296
-
297
- ### Macros (basic)
298
- In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
299
- ```
300
- ! {Subject}="cat","woman","man", {Location}="forest","lake","city", {Possessive}="its", "her", "his"
301
- In the video, a {Subject} is presented. The {Subject} is in a {Location} and looks at {Possessive} watch.
302
- ```
303
-
304
- This will create automatically 3 prompts that will cause the generation of 3 videos:
305
- ```
306
- In the video, a cat is presented. The cat is in a forest and looks at its watch.
307
- In the video, a man is presented. The man is in a lake and looks at his watch.
308
- In the video, a woman is presented. The woman is in a city and looks at her watch.
309
- ```
310
-
311
- You can define multiple lines of macros. If there is only one macro line, the app will generate a simple user interface to enter the macro variables when getting back to *Normal Mode* (advanced mode turned off)
312
-
313
- ### VACE ControlNet introduction
314
-
315
- Vace is a ControlNet that allows you to do Video to Video and Reference to Video (inject your own images into the output video). It is probably one of the most powerful Wan models and you will be able to do amazing things when you master it: inject in the scene people or objects of your choice, animate a person, perform inpainting or outpainting, continue a video, ...
316
-
317
- First you need to select the Vace 1.3B model or the Vace 13B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 7s with the Riflex option turned on.
318
-
319
- Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
320
- - *a Control Video*\
321
- Based on your choice, you can decide to transfer the motion, the depth in a new Video. You can tell WanGP to use only the first n frames of Control Video and to extrapolate the rest. You can also do inpainting. If the video contains area of grey color 127, they will be considered as masks and will be filled based on the Text prompt of the reference Images.
322
-
323
- - *Reference Images*\
324
- A reference Image can be as well a background that you want to use as a setting for the video or people or objects of your choice that you want to inject in the video. You can select multiple reference Images. The integration of object / person image is more efficient if the background is replaced by the full white color. For complex background removal you can use the Image version of the Matanyone tool that is embedded with WanGP or use you can use the fast on the fly background remover by selecting an option in the drop down box *Remove background*. Becareful not to remove the background of the reference image that is a landscape or setting (always the first reference image) that you want to use as a start image / background for the video. It helps greatly to reference and describe explictly the injected objects / people of the Reference Images in the text prompt.
325
-
326
- - *a Video Mask*\
327
- This offers a stronger mechanism to tell Vace which parts should be kept (black) or replaced (white). You can do as well inpainting / outpainting, fill the missing part of a video more efficientlty with just the video hint. For instance, if a video mask is white except at the beginning and at the end where it is black, the first and last frames will be kept and everything in between will be generated.
328
-
329
-
330
-
331
- Examples:
332
- - Inject people and / objects into a scene describe by a text prompt: Ref. Images + text Prompt
333
- - Animate a character described in a text prompt: a Video of person moving + text Prompt
334
- - Animate a character of your choice (motion transfer) : Ref Images + a Video of person moving + text Prompt
335
- - Change the style of a scene (depth transfer): a Video that contains objects / person at differen depths + text Prompt
336
-
337
-
338
- There are lots of possible combinations. Some of them require to prepare some materials (masks on top of video, full masks, etc...).
339
-
340
- Vace provides on its github (https://github.com/ali-vilab/VACE/tree/main/vace/gradios) annotators / preprocessors Gradio tool that can help you build some of these materials depending on the task you want to achieve.
341
-
342
- There is also a guide that describes the various combination of hints (https://github.com/ali-vilab/VACE/blob/main/UserGuide.md).Good luck !
343
-
344
- It seems you will get better results with Vace if you turn on "Skip Layer Guidance" with its default configuration.
345
-
346
- Other recommended setttings for Vace:
347
- - Use a long prompt description especially for the people / objects that are in the background and not in reference images. This will ensure consistency between the windows.
348
- - Set a medium size overlap window: long enough to give the model a sense of the motion but short enough so any overlapped blurred frames do no turn the rest of the video into a blurred video
349
- - Truncate at least the last 4 frames of the each generated window as Vace last frames tends to be blurry
350
-
351
- **WanGP integrates the Matanyone tool which is tuned to work with Vace**.
352
-
353
- This can be very useful to create at the same time a control video and a mask video that go together.\
354
- For example, if you want to replace a face of a person in a video:
355
- - load the video in the Matanyone tool
356
- - click the face on the first frame and create a mask for it (if you have some trouble to select only the face look at the tips below)
357
- - generate both the control video and the mask video by clicking *Generate Video Matting*
358
- - Click *Export to current Video Input and Video Mask*
359
- - In the *Reference Image* field of the Vace screen, load a picture of the replacement face
360
-
361
- Please notes that sometime it may be useful to create *Background Masks* if want for instance to replace everything but a character that is in the video. You can do that by selecting *Background Mask* in the *Matanyone settings*
362
-
363
- If you have some trouble creating the perfect mask, be aware of these tips:
364
- - Using the Matanyone Settings you can also define Negative Point Prompts to remove parts of the current selection.
365
- - Sometime it is very hard to fit everything you want in a single mask, it may be much easier to combine multiple independent sub Masks before producing the Matting : each sub Mask is created by selecting an area of an image and by clicking the Add Mask button. Sub masks can then be enabled / disabled in the Matanyone settings.
366
-
367
-
368
- ### VACE, Sky Reels v2 Diffusion Forcing Slidig Window and LTX Video
369
- With this mode (that works for the moment only with Vace, Sky Reels v2 and LTX Video) you can merge mutiple Videos to form a very long video (up to 1 min).
370
-
371
- When combined with Vace this feature can use the same control video to generate the full Video that results from concatenining the different windows. For instance the first 0-4s of the control video will be used to generate the first window then the next 4-8s of the control video will be used to generate the second window, and so on. So if your control video contains a person walking, your generate video could contain up to one minute of this person walking.
372
-
373
- When combined with Sky Reels V2, you can extend an existing video indefinetely.
374
-
375
- Sliding Windows are turned on by default and are triggered as soon as you try to generate a Video longer than the Window Size. You can go in the Advanced Settings Tab *Sliding Window* to set this Window Size. You can make the Video even longer during the generation process by adding one more Window to generate each time you click "Extend the Video Sample, Please !" button.
376
-
377
- Although the window duration is set by the *Sliding Window Size* form field, the actual number of frames generated by each iteration will be less, because of the *overlap frames* and *discard last frames*:
378
- - *overlap frames* : the first frames of a new window are filled with last frames of the previous window in order to ensure continuity between the two windows
379
- - *discard last frames* : sometime (Vace 1.3B model Only) the last frames of a window have a worse quality. You can decide here how many ending frames of a new window should be dropped.
380
-
381
- There is some inevitable quality degradation over time to due to accumulated errors in calculation. One trick to reduce it / hide it is to add some noise (usually not noticable) on the overlapped frames using the *add overlapped noise* option.
382
-
383
-
384
- Number of Generated Frames = [Number of Windows - 1] * ([Window Size] - [Overlap Frames] - [Discard Last Frames]) + [Window Size]
385
-
386
- Experimental: if your prompt is broken into multiple lines (each line separated by a carriage return), then each line of the prompt will be used for a new window. If there are more windows to generate than prompt lines, the last prompt line will be repeated.
387
-
388
-
389
- ### Command line parameters for Gradio Server
390
- --i2v : launch the image to video generator\
391
- --t2v : launch the text to video generator (default defined in the configuration)\
392
- --t2v-14B : launch the 14B model text to video generator\
393
- --t2v-1-3B : launch the 1.3B model text to video generator\
394
- --i2v-14B : launch the 14B model image to video generator\
395
- --i2v-1-3B : launch the Fun InP 1.3B model image to video generator\
396
- --vace : launch the Vace ControlNet 1.3B model image to video generator\
397
- --quantize-transformer bool: (default True) : enable / disable on the fly transformer quantization\
398
- --lora-dir path : Path of directory that contains Wan t2v Loras\
399
- --lora-dir-i2v path : Path of directory that contains Wan i2v Loras\
400
- --lora-dir-hunyuan path : Path of directory that contains Hunyuan t2v Loras\
401
- --lora-dir-hunyuan-i2v path : Path of directory that contains Hunyuan i2v Loras\
402
- --lora-dir-ltxv path : Path of directory that contains LTX Video Loras\
403
- --lora-preset preset : name of preset gile (without the extension) to preload
404
- --verbose level : default (1) : level of information between 0 and 2\
405
- --server-port portno : default (7860) : Gradio port no\
406
- --server-name name : default (localhost) : Gradio server name\
407
- --open-browser : open automatically Browser when launching Gradio Server\
408
- --lock-config : prevent modifying the video engine configuration from the interface\
409
- --share : create a shareable URL on huggingface so that your server can be accessed remotely\
410
- --multiple-images : allow the users to choose multiple images as different starting points for new videos\
411
- --compile : turn on pytorch compilation\
412
- --attention mode: force attention mode among, sdpa, flash, sage, sage2\
413
- --profile no : default (4) : no of profile between 1 and 5\
414
- --preload no : number in Megabytes to preload partially the diffusion model in VRAM , may offer speed gains on older hardware, on recent hardware (RTX 30XX, RTX40XX and RTX50XX) speed gain is only 10% and not worth it. Works only with profile 2 and 4.\
415
- --seed no : set default seed value\
416
- --frames no : set the default number of frames to generate\
417
- --steps no : set the default number of denoising steps\
418
- --teacache speed multiplier: Tea cache speed multiplier, choices=["0", "1.5", "1.75", "2.0", "2.25", "2.5"]\
419
- --slg : turn on skip layer guidance for improved quality\
420
- --check-loras : filter loras that are incompatible (will take a few seconds while refreshing the lora list or while starting the app)\
421
- --advanced : turn on the advanced mode while launching the app\
422
- --listen : make server accessible on network\
423
- --gpu device : run Wan on device for instance "cuda:1"\
424
- --settings: path a folder that contains the default settings for all the models\
425
- --fp16: force to use fp16 versions of models instead of bf16 versions\
426
- --perc-reserved-mem-max float_less_than_1 : max percentage of RAM to allocate to reserved RAM, allow faster transfers RAM<->VRAM. Value should remain below 0.5 to keep the OS stable\
427
- --theme theme_name: load the UI with the specified Theme Name, so far only two are supported, "default" and "gradio". You may submit your own nice looking Gradio theme and I will add them
428
-
429
- ### Profiles (for power users only)
430
- You can choose between 5 profiles, but two are really relevant here :
431
- - LowRAM_HighVRAM (3): loads entirely the model in VRAM if possible, slightly faster, but less VRAM available for the video data after that
432
- - LowRAM_LowVRAM (4): loads only the part of the model that is needed, low VRAM and low RAM requirement but slightly slower
433
-
434
- You can adjust the number of megabytes to preload a model, with --preload nnn (nnn is the number of megabytes to preload)
435
  ### Other Models for the GPU Poor
 
 
 
 
 
 
436
 
437
- - HuanyuanVideoGP: https://github.com/deepbeepmeep/HunyuanVideoGP :\
438
- One of the best open source Text to Video generator
439
-
440
- - Hunyuan3D-2GP: https://github.com/deepbeepmeep/Hunyuan3D-2GP :\
441
- A great image to 3D and text to 3D tool by the Tencent team. Thanks to mmgp it can run with less than 6 GB of VRAM
442
-
443
- - FluxFillGP: https://github.com/deepbeepmeep/FluxFillGP :\
444
- One of the best inpainting / outpainting tools based on Flux that can run with less than 12 GB of VRAM.
445
-
446
- - Cosmos1GP: https://github.com/deepbeepmeep/Cosmos1GP :\
447
- This application include two models: a text to world generator and a image / video to world (probably the best open source image to video generator).
448
-
449
- - OminiControlGP: https://github.com/deepbeepmeep/OminiControlGP :\
450
- A Flux derived application very powerful that can be used to transfer an object of your choice in a prompted scene. With mmgp you can run it with only 6 GB of VRAM.
451
-
452
- - YuE GP: https://github.com/deepbeepmeep/YuEGP :\
453
- A great song generator (instruments + singer's voice) based on prompted Lyrics and a genre description. Thanks to mmgp you can run it with less than 10 GB of VRAM without waiting forever.
454
-
455
 
 
 
 
 
15
  - Loras Support to customize each model
16
  - Queuing system : make your shopping list of videos to generate and come back later
17
 
 
18
  **Discord Server to get Help from Other Users and show your Best Videos:** https://discord.gg/g7efUW9jGV
19
 
20
+ ## 🔥 Latest Updates
21
 
22
+ ### May 26, 2025: Wan 2.1GP v5.3
23
+ 👋 Settings management revolution! Now you can:
24
+ - Select any generated video and click *Use Selected Video Settings* to instantly reuse its configuration
25
+ - Drag & drop videos to automatically extract their settings metadata
26
+ - Export/import settings as JSON files for easy sharing and backup
27
 
28
+ ### May 23, 2025: Wan 2.1GP v5.21
29
+ 👋 VACE improvements: Better sliding window transitions, image mask support in Matanyone, new Extend Video feature, and enhanced background removal options.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ ### May 20, 2025: Wan 2.1GP v5.2
32
+ 👋 **CausVid support** - Generate videos in just 4-12 steps with the new distilled Wan model! Also added experimental MoviiGen for 1080p generation (20GB+ VRAM required).
33
 
34
+ ### May 18, 2025: Wan 2.1GP v5.1
35
+ 👋 **LTX Video 13B Distilled** - Generate high-quality videos in less than one minute!
36
 
37
+ ### May 17, 2025: Wan 2.1GP v5.0
38
+ 👋 **One App to Rule Them All!** Added Hunyuan Video and LTX Video support, plus Vace 14B and integrated prompt enhancer.
 
 
 
39
 
40
+ See full changelog: **[Changelog](docs/CHANGELOG.md)**
41
 
42
+ ## 📋 Table of Contents
 
 
 
 
 
 
 
43
 
44
+ - [🚀 Quick Start](#-quick-start)
45
+ - [📦 Installation](#-installation)
46
+ - [🎯 Usage](#-usage)
47
+ - [📚 Documentation](#-documentation)
48
+ - [🔗 Related Projects](#-related-projects)
49
 
50
+ ## 🚀 Quick Start
51
 
52
+ **One-click installation:** Get started instantly with [Pinokio App](https://pinokio.computer/)
 
53
 
54
+ **Manual installation:**
55
+ ```bash
 
 
 
56
  git clone https://github.com/deepbeepmeep/Wan2GP.git
57
  cd Wan2GP
58
  conda create -n wan2gp python=3.10.9
59
  conda activate wan2gp
60
+ pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
 
 
 
 
61
  pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ```
63
 
64
+ **Run the application:**
 
 
 
 
 
 
 
 
65
  ```bash
66
+ python wgp.py # Text-to-video (default)
67
+ python wgp.py --i2v # Image-to-video
68
  ```
69
 
70
+ ## 📦 Installation
 
 
 
 
 
 
 
71
 
72
+ For detailed installation instructions for different GPU generations:
73
+ - **[Installation Guide](docs/INSTALLATION.md)** - Complete setup instructions for RTX 10XX to RTX 50XX
 
 
74
 
75
+ ## 🎯 Usage
 
 
 
76
 
77
+ ### Basic Usage
78
+ - **[Getting Started Guide](docs/GETTING_STARTED.md)** - First steps and basic usage
79
+ - **[Models Overview](docs/MODELS.md)** - Available models and their capabilities
80
 
81
+ ### Advanced Features
82
+ - **[Loras Guide](docs/LORAS.md)** - Using and managing Loras for customization
83
+ - **[VACE ControlNet](docs/VACE.md)** - Advanced video control and manipulation
84
+ - **[Command Line Reference](docs/CLI.md)** - All available command line options
85
 
86
+ ## 📚 Documentation
87
 
88
+ - **[Changelog](docs/CHANGELOG.md)** - Latest updates and version history
89
+ - **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions
90
 
91
+ ## 🔗 Related Projects
 
 
 
 
 
 
 
 
 
 
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  ### Other Models for the GPU Poor
94
+ - **[HuanyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP)** - One of the best open source Text to Video generators
95
+ - **[Hunyuan3D-2GP](https://github.com/deepbeepmeep/Hunyuan3D-2GP)** - Image to 3D and text to 3D tool
96
+ - **[FluxFillGP](https://github.com/deepbeepmeep/FluxFillGP)** - Inpainting/outpainting tools based on Flux
97
+ - **[Cosmos1GP](https://github.com/deepbeepmeep/Cosmos1GP)** - Text to world generator and image/video to world
98
+ - **[OminiControlGP](https://github.com/deepbeepmeep/OminiControlGP)** - Flux-derived application for object transfer
99
+ - **[YuE GP](https://github.com/deepbeepmeep/YuEGP)** - Song generator with instruments and singer's voice
100
 
101
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
+ <p align="center">
104
+ Made with ❤️ by DeepBeepMeep
105
+ </p>
docs/CHANGELOG.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Changelog
2
+
3
+ ## 🔥 Latest News
4
+
5
+ ### May 26, 2025: Wan 2.1GP v5.3
6
+ 👋 Happy with a Video generation and want to do more generations using the same settings but you can't remember what you did or you find it too hard to copy/paste one per one each setting from the file metadata? Rejoice! There are now multiple ways to turn this tedious process into a one click task:
7
+ - Select one Video recently generated in the Video Gallery and click *Use Selected Video Settings*
8
+ - Click *Drop File Here* and select a Video you saved somewhere, if the settings metadata have been saved with the Video you will be able to extract them automatically
9
+ - Click *Export Settings to File* to save on your harddrive the current settings. You will be able to use them later again by clicking *Drop File Here* and select this time a Settings json file
10
+
11
+ ### May 23, 2025: Wan 2.1GP v5.21
12
+ 👋 Improvements for Vace: better transitions between Sliding Windows, Support for Image masks in Matanyone, new Extend Video for Vace, different types of automated background removal
13
+
14
+ ### May 20, 2025: Wan 2.1GP v5.2
15
+ 👋 Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. The great thing is that Kijai (Kudos to him!) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B. See [LORAS.md](LORAS.md) for instructions on how to use CausVid.
16
+
17
+ Also as an experiment I have added support for the MoviiGen, the first model that claims to be capable of generating 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server.
18
+
19
+ ### May 18, 2025: Wan 2.1GP v5.1
20
+ 👋 Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos!
21
+
22
+ ### May 17, 2025: Wan 2.1GP v5.0
23
+ 👋 One App to Rule Them All! Added support for the other great open source architectures:
24
+ - **Hunyuan Video**: text 2 video (one of the best, if not the best t2v), image 2 video and the recently released Hunyuan Custom (very good identity preservation when injecting a person into a video)
25
+ - **LTX Video 13B** (released last week): very long video support and fast 720p generation. Wan GP version has been greatly optimized and reduced LTX Video VRAM requirements by 4!
26
+
27
+ Also:
28
+ - Added support for the best Control Video Model, released 2 days ago: Vace 14B
29
+ - New Integrated prompt enhancer to increase the quality of the generated videos
30
+
31
+ *You will need one more `pip install -r requirements.txt`*
32
+
33
+ ### May 5, 2025: Wan 2.1GP v4.5
34
+ 👋 FantasySpeaking model, you can animate a talking head using a voice track. This works not only on people but also on objects. Also better seamless transitions between Vace sliding windows for very long videos. New high quality processing features (mixed 16/32 bits calculation and 32 bits VAE)
35
+
36
+ ### April 27, 2025: Wan 2.1GP v4.4
37
+ 👋 Phantom model support, very good model to transfer people or objects into video, works quite well at 720p and with the number of steps > 30
38
+
39
+ ### April 25, 2025: Wan 2.1GP v4.3
40
+ 👋 Added preview mode and support for Sky Reels v2 Diffusion Forcing for high quality "infinite length videos". Note that Skyreel uses causal attention that is only supported by Sdpa attention so even if you choose another type of attention, some of the processes will use Sdpa attention.
41
+
42
+ ### April 18, 2025: Wan 2.1GP v4.2
43
+ 👋 FLF2V model support, official support from Wan for image2video start and end frames specialized for 720p.
44
+
45
+ ### April 17, 2025: Wan 2.1GP v4.1
46
+ 👋 Recam Master model support, view a video from a different angle. The video to process must be at least 81 frames long and you should set at least 15 steps denoising to get good results.
47
+
48
+ ### April 13, 2025: Wan 2.1GP v4.0
49
+ 👋 Lots of goodies for you!
50
+ - A new UI, tabs were replaced by a Dropdown box to easily switch models
51
+ - A new queuing system that lets you stack in a queue as many text2video, image2video tasks, ... as you want. Each task can rely on complete different generation parameters (different number of frames, steps, loras, ...). Many thanks to **Tophness** for being a big contributor on this new feature
52
+ - Temporal upsampling (Rife) and spatial upsampling (Lanczos) for a smoother video (32 fps or 64 fps) and to enlarge your video by x2 or x4. Check these new advanced options.
53
+ - Wan Vace Control Net support: with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ... See [VACE.md](VACE.md) for introduction guide.
54
+ - Integrated *Matanyone* tool directly inside WanGP so that you can create easily inpainting masks used in Vace
55
+ - Sliding Window generation for Vace, create windows that can last dozens of seconds
56
+ - New optimizations for old generation GPUs: Generate 5s (81 frames, 15 steps) of Vace 1.3B with only 5GB and in only 6 minutes on a RTX 2080Ti and 5s of t2v 14B in less than 10 minutes.
57
+
58
+ ### March 27, 2025
59
+ 👋 Added support for the new Wan Fun InP models (image2video). The 14B Fun InP has probably better end image support but unfortunately existing loras do not work so well with it. The great novelty is the Fun InP image2 1.3B model: Image 2 Video is now accessible to even lower hardware configuration. It is not as good as the 14B models but very impressive for its size. Many thanks to the VideoX-Fun team (https://github.com/aigc-apps/VideoX-Fun)
60
+
61
+ ### March 26, 2025
62
+ 👋 Good news! Official support for RTX 50xx please check the [installation instructions](INSTALLATION.md).
63
+
64
+ ### March 24, 2025: Wan2.1GP v3.2
65
+ 👋
66
+ - Added Classifier-Free Guidance Zero Star. The video should match better the text prompt (especially with text2video) at no performance cost: many thanks to the **CFG Zero * Team**. Don't hesitate to give them a star if you appreciate the results: https://github.com/WeichenFan/CFG-Zero-star
67
+ - Added back support for PyTorch compilation with Loras. It seems it had been broken for some time
68
+ - Added possibility to keep a number of pregenerated videos in the Video Gallery (useful to compare outputs of different settings)
69
+
70
+ *You will need one more `pip install -r requirements.txt`*
71
+
72
+ ### March 19, 2025: Wan2.1GP v3.1
73
+ 👋 Faster launch and RAM optimizations (should require less RAM to run)
74
+
75
+ *You will need one more `pip install -r requirements.txt`*
76
+
77
+ ### March 18, 2025: Wan2.1GP v3.0
78
+ 👋
79
+ - New Tab based interface, you can switch from i2v to t2v conversely without restarting the app
80
+ - Experimental Dual Frames mode for i2v, you can also specify an End frame. It doesn't always work, so you will need a few attempts.
81
+ - You can save default settings in the files *i2v_settings.json* and *t2v_settings.json* that will be used when launching the app (you can also specify the path to different settings files)
82
+ - Slight acceleration with loras
83
+
84
+ *You will need one more `pip install -r requirements.txt`*
85
+
86
+ Many thanks to *Tophness* who created the framework (and did a big part of the work) of the multitabs and saved settings features
87
+
88
+ ### March 18, 2025: Wan2.1GP v2.11
89
+ 👋 Added more command line parameters to prefill the generation settings + customizable output directory and choice of type of metadata for generated videos. Many thanks to *Tophness* for his contributions.
90
+
91
+ *You will need one more `pip install -r requirements.txt` to reflect new dependencies*
92
+
93
+ ### March 18, 2025: Wan2.1GP v2.1
94
+ 👋 More Loras!: added support for 'Safetensors' and 'Replicate' Lora formats.
95
+
96
+ *You will need to refresh the requirements with a `pip install -r requirements.txt`*
97
+
98
+ ### March 17, 2025: Wan2.1GP v2.0
99
+ 👋 The Lora festival continues:
100
+ - Clearer user interface
101
+ - Download 30 Loras in one click to try them all (expand the info section)
102
+ - Very easy to use Loras as now Lora presets can input the subject (or other needed terms) of the Lora so that you don't have to modify manually a prompt
103
+ - Added basic macro prompt language to prefill prompts with different values. With one prompt template, you can generate multiple prompts.
104
+ - New Multiple images prompts: you can now combine any number of images with any number of text prompts (need to launch the app with --multiple-images)
105
+ - New command lines options to launch directly the 1.3B t2v model or the 14B t2v model
106
+
107
+ ### March 14, 2025: Wan2.1GP v1.7
108
+ 👋
109
+ - Lora Fest special edition: very fast loading/unload of loras for those Loras collectors around. You can also now add/remove loras in the Lora folder without restarting the app.
110
+ - Added experimental Skip Layer Guidance (advanced settings), that should improve the image quality at no extra cost. Many thanks to the *AmericanPresidentJimmyCarter* for the original implementation
111
+
112
+ *You will need to refresh the requirements `pip install -r requirements.txt`*
113
+
114
+ ### March 13, 2025: Wan2.1GP v1.6
115
+ 👋 Better Loras support, accelerated loading Loras.
116
+
117
+ *You will need to refresh the requirements `pip install -r requirements.txt`*
118
+
119
+ ### March 10, 2025: Wan2.1GP v1.5
120
+ 👋 Official Teacache support + Smart Teacache (find automatically best parameters for a requested speed multiplier), 10% speed boost with no quality loss, improved lora presets (they can now include prompts and comments to guide the user)
121
+
122
+ ### March 7, 2025: Wan2.1GP v1.4
123
+ 👋 Fix PyTorch compilation, now it is really 20% faster when activated
124
+
125
+ ### March 4, 2025: Wan2.1GP v1.3
126
+ 👋 Support for Image to Video with multiples images for different images/prompts combinations (requires *--multiple-images* switch), and added command line *--preload x* to preload in VRAM x MB of the main diffusion model if you find there is too much unused VRAM and you want to (slightly) accelerate the generation process.
127
+
128
+ *If you upgrade you will need to do a `pip install -r requirements.txt` again.*
129
+
130
+ ### March 4, 2025: Wan2.1GP v1.2
131
+ 👋 Implemented tiling on VAE encoding and decoding. No more VRAM peaks at the beginning and at the end
132
+
133
+ ### March 3, 2025: Wan2.1GP v1.1
134
+ 👋 Added Tea Cache support for faster generations: optimization of kijai's implementation (https://github.com/kijai/ComfyUI-WanVideoWrapper/) of teacache (https://github.com/ali-vilab/TeaCache)
135
+
136
+ ### March 2, 2025: Wan2.1GP by DeepBeepMeep v1
137
+ 👋 Brings:
138
+ - Support for all Wan including the Image to Video model
139
+ - Reduced memory consumption by 2, with possibility to generate more than 10s of video at 720p with a RTX 4090 and 10s of video at 480p with less than 12GB of VRAM. Many thanks to REFLEx (https://github.com/thu-ml/RIFLEx) for their algorithm that allows generating nice looking video longer than 5s.
140
+ - The usual perks: web interface, multiple generations, loras support, sage attention, auto download of models, ...
141
+
142
+ ## Original Wan Releases
143
+
144
+ ### February 25, 2025
145
+ 👋 We've released the inference code and weights of Wan2.1.
146
+
147
+ ### February 27, 2025
148
+ 👋 Wan2.1 has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
docs/CLI.md ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Command Line Reference
2
+
3
+ This document covers all available command line options for WanGP.
4
+
5
+ ## Basic Usage
6
+
7
+ ```bash
8
+ # Default launch (text-to-video)
9
+ python wgp.py
10
+
11
+ # Specific model modes
12
+ python wgp.py --i2v # Image-to-video
13
+ python wgp.py --t2v # Text-to-video (default)
14
+ python wgp.py --t2v-14B # 14B text-to-video model
15
+ python wgp.py --t2v-1-3B # 1.3B text-to-video model
16
+ python wgp.py --i2v-14B # 14B image-to-video model
17
+ python wgp.py --i2v-1-3B # Fun InP 1.3B image-to-video model
18
+ python wgp.py --vace # VACE ControlNet 1.3B model
19
+ ```
20
+
21
+ ## Model and Performance Options
22
+
23
+ ### Model Configuration
24
+ ```bash
25
+ --quantize-transformer BOOL # Enable/disable transformer quantization (default: True)
26
+ --compile # Enable PyTorch compilation (requires Triton)
27
+ --attention MODE # Force attention mode: sdpa, flash, sage, sage2
28
+ --profile NUMBER # Performance profile 1-5 (default: 4)
29
+ --preload NUMBER # Preload N MB of diffusion model in VRAM
30
+ --fp16 # Force fp16 instead of bf16 models
31
+ --gpu DEVICE # Run on specific GPU device (e.g., "cuda:1")
32
+ ```
33
+
34
+ ### Performance Profiles
35
+ - **Profile 1**: Maximum RAM usage, minimum VRAM
36
+ - **Profile 2**: Balanced RAM/VRAM usage
37
+ - **Profile 3 (LowRAM_HighVRAM)**: Load entire model in VRAM (requires 24GB for 14B model)
38
+ - **Profile 4 (LowRAM_LowVRAM)**: Default, load model parts as needed
39
+ - **Profile 5**: Minimum RAM usage
40
+
41
+ ### Memory Management
42
+ ```bash
43
+ --perc-reserved-mem-max FLOAT # Max percentage of RAM for reserved memory (< 0.5)
44
+ ```
45
+
46
+ ## Lora Configuration
47
+
48
+ ```bash
49
+ --lora-dir PATH # Path to Wan t2v loras directory
50
+ --lora-dir-i2v PATH # Path to Wan i2v loras directory
51
+ --lora-dir-hunyuan PATH # Path to Hunyuan t2v loras directory
52
+ --lora-dir-hunyuan-i2v PATH # Path to Hunyuan i2v loras directory
53
+ --lora-dir-ltxv PATH # Path to LTX Video loras directory
54
+ --lora-preset PRESET # Load lora preset file (.lset) on startup
55
+ --check-loras # Filter incompatible loras (slower startup)
56
+ ```
57
+
58
+ ## Generation Settings
59
+
60
+ ### Basic Generation
61
+ ```bash
62
+ --seed NUMBER # Set default seed value
63
+ --frames NUMBER # Set default number of frames to generate
64
+ --steps NUMBER # Set default number of denoising steps
65
+ --advanced # Launch with advanced mode enabled
66
+ ```
67
+
68
+ ### Advanced Generation
69
+ ```bash
70
+ --teacache MULTIPLIER # TeaCache speed multiplier: 0, 1.5, 1.75, 2.0, 2.25, 2.5
71
+ --slg # Enable Skip Layer Guidance for improved quality
72
+ ```
73
+
74
+ ## Interface and Server Options
75
+
76
+ ### Server Configuration
77
+ ```bash
78
+ --server-port PORT # Gradio server port (default: 7860)
79
+ --server-name NAME # Gradio server name (default: localhost)
80
+ --listen # Make server accessible on network
81
+ --share # Create shareable HuggingFace URL for remote access
82
+ --open-browser # Open browser automatically when launching
83
+ ```
84
+
85
+ ### Interface Options
86
+ ```bash
87
+ --multiple-images # Allow multiple image inputs for different starting points
88
+ --lock-config # Prevent modifying video engine configuration from interface
89
+ --theme THEME_NAME # UI theme: "default" or "gradio"
90
+ ```
91
+
92
+ ## File and Directory Options
93
+
94
+ ```bash
95
+ --settings PATH # Path to folder containing default settings for all models
96
+ --verbose LEVEL # Information level 0-2 (default: 1)
97
+ ```
98
+
99
+ ## Examples
100
+
101
+ ### Basic Usage Examples
102
+ ```bash
103
+ # Launch with specific model and loras
104
+ python wgp.py --t2v-14B --lora-preset mystyle.lset
105
+
106
+ # High-performance setup with compilation
107
+ python wgp.py --compile --attention sage2 --profile 3
108
+
109
+ # Low VRAM setup
110
+ python wgp.py --t2v-1-3B --profile 4 --attention sdpa
111
+
112
+ # Multiple images with custom lora directory
113
+ python wgp.py --i2v --multiple-images --lora-dir /path/to/shared/loras
114
+ ```
115
+
116
+ ### Server Configuration Examples
117
+ ```bash
118
+ # Network accessible server
119
+ python wgp.py --listen --server-port 8080
120
+
121
+ # Shareable server with custom theme
122
+ python wgp.py --share --theme gradio --open-browser
123
+
124
+ # Locked configuration for public use
125
+ python wgp.py --lock-config --share
126
+ ```
127
+
128
+ ### Advanced Performance Examples
129
+ ```bash
130
+ # Maximum performance (requires high-end GPU)
131
+ python wgp.py --compile --attention sage2 --profile 3 --preload 2000
132
+
133
+ # Optimized for RTX 2080Ti
134
+ python wgp.py --profile 4 --attention sdpa --teacache 2.0
135
+
136
+ # Memory-efficient setup
137
+ python wgp.py --fp16 --profile 4 --perc-reserved-mem-max 0.3
138
+ ```
139
+
140
+ ### TeaCache Configuration
141
+ ```bash
142
+ # Different speed multipliers
143
+ python wgp.py --teacache 1.5 # 1.5x speed, minimal quality loss
144
+ python wgp.py --teacache 2.0 # 2x speed, some quality loss
145
+ python wgp.py --teacache 2.5 # 2.5x speed, noticeable quality loss
146
+ python wgp.py --teacache 0 # Disable TeaCache
147
+ ```
148
+
149
+ ## Attention Modes
150
+
151
+ ### SDPA (Default)
152
+ ```bash
153
+ python wgp.py --attention sdpa
154
+ ```
155
+ - Available by default with PyTorch
156
+ - Good compatibility with all GPUs
157
+ - Moderate performance
158
+
159
+ ### Sage Attention
160
+ ```bash
161
+ python wgp.py --attention sage
162
+ ```
163
+ - Requires Triton installation
164
+ - 30% faster than SDPA
165
+ - Small quality cost
166
+
167
+ ### Sage2 Attention
168
+ ```bash
169
+ python wgp.py --attention sage2
170
+ ```
171
+ - Requires Triton and SageAttention 2.x
172
+ - 40% faster than SDPA
173
+ - Best performance option
174
+
175
+ ### Flash Attention
176
+ ```bash
177
+ python wgp.py --attention flash
178
+ ```
179
+ - May require CUDA kernel compilation
180
+ - Good performance
181
+ - Can be complex to install on Windows
182
+
183
+ ## Troubleshooting Command Lines
184
+
185
+ ### Fallback to Basic Setup
186
+ ```bash
187
+ # If advanced features don't work
188
+ python wgp.py --attention sdpa --profile 4 --fp16
189
+ ```
190
+
191
+ ### Debug Mode
192
+ ```bash
193
+ # Maximum verbosity for troubleshooting
194
+ python wgp.py --verbose 2 --check-loras
195
+ ```
196
+
197
+ ### Memory Issue Debugging
198
+ ```bash
199
+ # Minimal memory usage
200
+ python wgp.py --profile 4 --attention sdpa --perc-reserved-mem-max 0.2
201
+ ```
202
+
203
+ ### GPU-specific Configurations
204
+ ```bash
205
+ # RTX 10XX/20XX series
206
+ python wgp.py --attention sdpa --profile 4 --fp16
207
+
208
+ # RTX 30XX/40XX series
209
+ python wgp.py --attention sage --compile --profile 3
210
+
211
+ # RTX 50XX series (beta)
212
+ python wgp.py --attention sage --fp16 --profile 4
213
+ ```
214
+
215
+ ## Configuration Files
216
+
217
+ ### Settings Files
218
+ You can save default settings in JSON files:
219
+ - `i2v_settings.json` - Image-to-video default settings
220
+ - `t2v_settings.json` - Text-to-video default settings
221
+
222
+ Load custom settings:
223
+ ```bash
224
+ python wgp.py --settings /path/to/settings/folder
225
+ ```
226
+
227
+ ### Lora Presets
228
+ Create and share lora configurations:
229
+ ```bash
230
+ # Load specific preset
231
+ python wgp.py --lora-preset anime_style.lset
232
+
233
+ # With custom lora directory
234
+ python wgp.py --lora-preset mystyle.lset --lora-dir /shared/loras
235
+ ```
236
+
237
+ ## Environment Variables
238
+
239
+ While not command line options, these environment variables can affect behavior:
240
+ - `CUDA_VISIBLE_DEVICES` - Limit visible GPUs
241
+ - `PYTORCH_CUDA_ALLOC_CONF` - CUDA memory allocation settings
242
+ - `TRITON_CACHE_DIR` - Triton cache directory (for Sage attention)
docs/GETTING_STARTED.md ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Getting Started with WanGP
2
+
3
+ This guide will help you get started with WanGP video generation quickly and easily.
4
+
5
+ ## Prerequisites
6
+
7
+ Before starting, ensure you have:
8
+ - A compatible GPU (RTX 10XX or newer recommended)
9
+ - Python 3.10.9 installed
10
+ - At least 6GB of VRAM for basic models
11
+ - Internet connection for model downloads
12
+
13
+ ## Quick Setup
14
+
15
+ ### Option 1: One-Click Installation (Recommended)
16
+ Use [Pinokio App](https://pinokio.computer/) for the easiest installation experience.
17
+
18
+ ### Option 2: Manual Installation
19
+ ```bash
20
+ git clone https://github.com/deepbeepmeep/Wan2GP.git
21
+ cd Wan2GP
22
+ conda create -n wan2gp python=3.10.9
23
+ conda activate wan2gp
24
+ pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
25
+ pip install -r requirements.txt
26
+ ```
27
+
28
+ For detailed installation instructions, see [INSTALLATION.md](INSTALLATION.md).
29
+
30
+ ## First Launch
31
+
32
+ ### Basic Launch
33
+ ```bash
34
+ python wgp.py
35
+ ```
36
+ This launches the text-to-video generator with default settings.
37
+
38
+ ### Alternative Modes
39
+ ```bash
40
+ python wgp.py --i2v # Image-to-video mode
41
+ python wgp.py --t2v-1-3B # Smaller, faster model
42
+ ```
43
+
44
+ ## Understanding the Interface
45
+
46
+ When you launch WanGP, you'll see a web interface with several sections:
47
+
48
+ ### Main Generation Panel
49
+ - **Model Selection**: Dropdown to choose between different models
50
+ - **Prompt**: Text description of what you want to generate
51
+ - **Generate Button**: Start the video generation process
52
+
53
+ ### Advanced Settings (click checkbox to enable)
54
+ - **Generation Settings**: Steps, guidance, seeds
55
+ - **Loras**: Additional style customizations
56
+ - **Sliding Window**: For longer videos
57
+
58
+ ## Your First Video
59
+
60
+ Let's generate a simple text-to-video:
61
+
62
+ 1. **Launch WanGP**: `python wgp.py`
63
+ 2. **Open Browser**: Navigate to `http://localhost:7860`
64
+ 3. **Enter Prompt**: "A cat walking in a garden"
65
+ 4. **Click Generate**: Wait for the video to be created
66
+ 5. **View Result**: The video will appear in the output section
67
+
68
+ ### Recommended First Settings
69
+ - **Model**: Wan 2.1 text2video 1.3B (faster, lower VRAM)
70
+ - **Frames**: 49 (about 2 seconds)
71
+ - **Steps**: 20 (good balance of speed/quality)
72
+
73
+ ## Model Selection
74
+
75
+ ### Text-to-Video Models
76
+ - **Wan 2.1 T2V 1.3B**: Fastest, lowest VRAM (6GB), good quality
77
+ - **Wan 2.1 T2V 14B**: Best quality, requires more VRAM (12GB+)
78
+ - **Hunyuan Video**: Excellent quality, slower generation
79
+ - **LTX Video**: Good for longer videos
80
+
81
+ ### Image-to-Video Models
82
+ - **Wan Fun InP 1.3B**: Fast image animation
83
+ - **Wan Fun InP 14B**: Higher quality image animation
84
+ - **VACE**: Advanced control over video generation
85
+
86
+ ### Choosing the Right Model
87
+ - **Low VRAM (6-8GB)**: Use 1.3B models
88
+ - **Medium VRAM (10-12GB)**: Use 14B models or Hunyuan
89
+ - **High VRAM (16GB+)**: Any model, longer videos
90
+
91
+ ## Basic Settings Explained
92
+
93
+ ### Generation Settings
94
+ - **Frames**: Number of frames (more = longer video)
95
+ - 25 frames ≈ 1 second
96
+ - 49 frames ≈ 2 seconds
97
+ - 73 frames ≈ 3 seconds
98
+
99
+ - **Steps**: Quality vs Speed tradeoff
100
+ - 15 steps: Fast, lower quality
101
+ - 20 steps: Good balance
102
+ - 30+ steps: High quality, slower
103
+
104
+ - **Guidance Scale**: How closely to follow the prompt
105
+ - 3-5: More creative interpretation
106
+ - 7-10: Closer to prompt description
107
+ - 12+: Very literal interpretation
108
+
109
+ ### Seeds
110
+ - **Random Seed**: Different result each time
111
+ - **Fixed Seed**: Reproducible results
112
+ - **Use same seed + prompt**: Generate variations
113
+
114
+ ## Common Beginner Issues
115
+
116
+ ### "Out of Memory" Errors
117
+ 1. Use smaller models (1.3B instead of 14B)
118
+ 2. Reduce frame count
119
+ 3. Lower resolution in advanced settings
120
+ 4. Enable quantization (usually on by default)
121
+
122
+ ### Slow Generation
123
+ 1. Use 1.3B models for speed
124
+ 2. Reduce number of steps
125
+ 3. Install Sage attention (see [INSTALLATION.md](INSTALLATION.md))
126
+ 4. Enable TeaCache: `python wgp.py --teacache 2.0`
127
+
128
+ ### Poor Quality Results
129
+ 1. Increase number of steps (25-30)
130
+ 2. Improve prompt description
131
+ 3. Use 14B models if you have enough VRAM
132
+ 4. Enable Skip Layer Guidance in advanced settings
133
+
134
+ ## Writing Good Prompts
135
+
136
+ ### Basic Structure
137
+ ```
138
+ [Subject] [Action] [Setting] [Style/Quality modifiers]
139
+ ```
140
+
141
+ ### Examples
142
+ ```
143
+ A red sports car driving through a mountain road at sunset, cinematic, high quality
144
+
145
+ A woman with long hair walking on a beach, waves in the background, realistic, detailed
146
+
147
+ A cat sitting on a windowsill watching rain, cozy atmosphere, soft lighting
148
+ ```
149
+
150
+ ### Tips
151
+ - Be specific about what you want
152
+ - Include style descriptions (cinematic, realistic, etc.)
153
+ - Mention lighting and atmosphere
154
+ - Describe the setting in detail
155
+ - Use quality modifiers (high quality, detailed, etc.)
156
+
157
+ ## Next Steps
158
+
159
+ Once you're comfortable with basic generation:
160
+
161
+ 1. **Explore Advanced Features**:
162
+ - [Loras Guide](LORAS.md) - Customize styles and characters
163
+ - [VACE ControlNet](VACE.md) - Advanced video control
164
+ - [Command Line Options](CLI.md) - Optimize performance
165
+
166
+ 2. **Improve Performance**:
167
+ - Install better attention mechanisms
168
+ - Optimize memory settings
169
+ - Use compilation for speed
170
+
171
+ 3. **Join the Community**:
172
+ - [Discord Server](https://discord.gg/g7efUW9jGV) - Get help and share videos
173
+ - Share your best results
174
+ - Learn from other users
175
+
176
+ ## Troubleshooting First Steps
177
+
178
+ ### Installation Issues
179
+ - Ensure Python 3.10.9 is used
180
+ - Check CUDA version compatibility
181
+ - See [INSTALLATION.md](INSTALLATION.md) for detailed steps
182
+
183
+ ### Generation Issues
184
+ - Check GPU compatibility
185
+ - Verify sufficient VRAM
186
+ - Try basic settings first
187
+ - See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for specific issues
188
+
189
+ ### Performance Issues
190
+ - Use appropriate model for your hardware
191
+ - Enable performance optimizations
192
+ - Check [CLI.md](CLI.md) for optimization flags
193
+
194
+ Remember: Start simple and gradually explore more advanced features as you become comfortable with the basics!
docs/INSTALLATION.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Installation Guide
2
+
3
+ This guide covers installation for different GPU generations and operating systems.
4
+
5
+ ## Requirements
6
+
7
+ - Python 3.10.9
8
+ - Conda or Python venv
9
+ - Compatible GPU (RTX 10XX or newer recommended)
10
+
11
+ ## Installation for RTX 10XX to RTX 40XX (Stable)
12
+
13
+ This installation uses PyTorch 2.6.0 which is well-tested and stable.
14
+
15
+ ### Step 1: Download and Setup Environment
16
+
17
+ ```shell
18
+ # Clone the repository
19
+ git clone https://github.com/deepbeepmeep/Wan2GP.git
20
+ cd Wan2GP
21
+
22
+ # Create Python 3.10.9 environment using conda
23
+ conda create -n wan2gp python=3.10.9
24
+ conda activate wan2gp
25
+ ```
26
+
27
+ ### Step 2: Install PyTorch
28
+
29
+ ```shell
30
+ # Install PyTorch 2.6.0 with CUDA 12.4
31
+ pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
32
+ ```
33
+
34
+ ### Step 3: Install Dependencies
35
+
36
+ ```shell
37
+ # Install core dependencies
38
+ pip install -r requirements.txt
39
+ ```
40
+
41
+ ### Step 4: Optional Performance Optimizations
42
+
43
+ #### Sage Attention (30% faster)
44
+
45
+ ```shell
46
+ # Windows only: Install Triton
47
+ pip install triton-windows
48
+
49
+ # For both Windows and Linux
50
+ pip install sageattention==1.0.6
51
+ ```
52
+
53
+ #### Sage 2 Attention (40% faster)
54
+
55
+ ```shell
56
+ # Windows
57
+ pip install triton-windows
58
+ pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp310-cp310-win_amd64.whl
59
+
60
+ # Linux (manual compilation required)
61
+ git clone https://github.com/thu-ml/SageAttention
62
+ cd SageAttention
63
+ pip install -e .
64
+ ```
65
+
66
+ #### Flash Attention
67
+
68
+ ```shell
69
+ # May require CUDA kernel compilation on Windows
70
+ pip install flash-attn==2.7.2.post1
71
+ ```
72
+
73
+ ## Installation for RTX 50XX (Beta)
74
+
75
+ RTX 50XX GPUs require PyTorch 2.7.0 (beta). This version may be less stable.
76
+
77
+ ⚠️ **Important:** Use Python 3.10 for compatibility with pip wheels.
78
+
79
+ ### Step 1: Setup Environment
80
+
81
+ ```shell
82
+ # Clone and setup (same as above)
83
+ git clone https://github.com/deepbeepmeep/Wan2GP.git
84
+ cd Wan2GP
85
+ conda create -n wan2gp python=3.10.9
86
+ conda activate wan2gp
87
+ ```
88
+
89
+ ### Step 2: Install PyTorch Beta
90
+
91
+ ```shell
92
+ # Install PyTorch 2.7.0 with CUDA 12.8
93
+ pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
94
+ ```
95
+
96
+ ### Step 3: Install Dependencies
97
+
98
+ ```shell
99
+ pip install -r requirements.txt
100
+ ```
101
+
102
+ ### Step 4: Optional Optimizations for RTX 50XX
103
+
104
+ #### Sage Attention
105
+
106
+ ```shell
107
+ # Windows
108
+ pip install triton-windows
109
+ pip install sageattention==1.0.6
110
+
111
+ # Linux
112
+ pip install sageattention==1.0.6
113
+ ```
114
+
115
+ #### Sage 2 Attention
116
+
117
+ ```shell
118
+ # Windows
119
+ pip install triton-windows
120
+ pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl
121
+
122
+ # Linux (manual compilation)
123
+ git clone https://github.com/thu-ml/SageAttention
124
+ cd SageAttention
125
+ pip install -e .
126
+ ```
127
+
128
+ ## Attention Modes
129
+
130
+ WanGP supports several attention implementations:
131
+
132
+ - **SDPA** (default): Available by default with PyTorch
133
+ - **Sage**: 30% speed boost with small quality cost
134
+ - **Sage2**: 40% speed boost
135
+ - **Flash**: Good performance, may be complex to install on Windows
136
+
137
+ ## Performance Profiles
138
+
139
+ Choose a profile based on your hardware:
140
+
141
+ - **Profile 3 (LowRAM_HighVRAM)**: Loads entire model in VRAM, requires 24GB VRAM for 8-bit quantized 14B model
142
+ - **Profile 4 (LowRAM_LowVRAM)**: Default, loads model parts as needed, slower but lower VRAM requirement
143
+
144
+ ## Troubleshooting
145
+
146
+ ### Sage Attention Issues
147
+
148
+ If Sage attention doesn't work:
149
+
150
+ 1. Check if Triton is properly installed
151
+ 2. Clear Triton cache
152
+ 3. Fallback to SDPA attention:
153
+ ```bash
154
+ python wgp.py --attention sdpa
155
+ ```
156
+
157
+ ### Memory Issues
158
+
159
+ - Use lower resolution or shorter videos
160
+ - Enable quantization (default)
161
+ - Use Profile 4 for lower VRAM usage
162
+ - Consider using 1.3B models instead of 14B models
163
+
164
+ ### GPU Compatibility
165
+
166
+ - RTX 10XX, 20XX: Supported with SDPA attention
167
+ - RTX 30XX, 40XX: Full feature support
168
+ - RTX 50XX: Beta support with PyTorch 2.7.0
169
+
170
+ For more troubleshooting, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md)
docs/LORAS.md ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Loras Guide
2
+
3
+ Loras (Low-Rank Adaptations) allow you to customize video generation models by adding specific styles, characters, or effects to your videos.
4
+
5
+ ## Directory Structure
6
+
7
+ Loras are organized in different folders based on the model they're designed for:
8
+
9
+ ### Text-to-Video Models
10
+ - `loras/` - General t2v loras
11
+ - `loras/1.3B/` - Loras specifically for 1.3B models
12
+ - `loras/14B/` - Loras specifically for 14B models
13
+
14
+ ### Image-to-Video Models
15
+ - `loras_i2v/` - Image-to-video loras
16
+
17
+ ### Other Models
18
+ - `loras_hunyuan/` - Hunyuan Video t2v loras
19
+ - `loras_hunyuan_i2v/` - Hunyuan Video i2v loras
20
+ - `loras_ltxv/` - LTX Video loras
21
+
22
+ ## Custom Lora Directory
23
+
24
+ You can specify custom lora directories when launching the app:
25
+
26
+ ```bash
27
+ # Use shared lora directory for both t2v and i2v
28
+ python wgp.py --lora-dir /path/to/shared/loras --lora-dir-i2v /path/to/shared/loras
29
+
30
+ # Specify different directories for different models
31
+ python wgp.py --lora-dir-hunyuan /path/to/hunyuan/loras --lora-dir-ltxv /path/to/ltx/loras
32
+ ```
33
+
34
+ ## Using Loras
35
+
36
+ ### Basic Usage
37
+
38
+ 1. Place your lora files in the appropriate directory
39
+ 2. Launch WanGP
40
+ 3. In the Advanced Tab, select the "Loras" section
41
+ 4. Check the loras you want to activate
42
+ 5. Set multipliers for each lora (default is 1.0)
43
+
44
+ ### Lora Multipliers
45
+
46
+ Multipliers control the strength of each lora's effect:
47
+
48
+ #### Simple Multipliers
49
+ ```
50
+ 1.2 0.8
51
+ ```
52
+ - First lora: 1.2 strength
53
+ - Second lora: 0.8 strength
54
+
55
+ #### Time-based Multipliers
56
+ For dynamic effects over generation steps, use comma-separated values:
57
+ ```
58
+ 0.9,0.8,0.7
59
+ 1.2,1.1,1.0
60
+ ```
61
+ - For 30 steps: steps 0-9 use first value, 10-19 use second, 20-29 use third
62
+ - First lora: 0.9 → 0.8 → 0.7
63
+ - Second lora: 1.2 → 1.1 → 1.0
64
+
65
+ ## Lora Presets
66
+
67
+ Presets are combinations of loras with predefined multipliers and prompts.
68
+
69
+ ### Creating Presets
70
+ 1. Configure your loras and multipliers
71
+ 2. Write a prompt with comments (lines starting with #)
72
+ 3. Save as a preset with `.lset` extension
73
+
74
+ ### Example Preset
75
+ ```
76
+ # Use the keyword "ohnvx" to trigger the lora
77
+ A ohnvx character is driving a car through the city
78
+ ```
79
+
80
+ ### Using Presets
81
+ ```bash
82
+ # Load preset on startup
83
+ python wgp.py --lora-preset mypreset.lset
84
+ ```
85
+
86
+ ### Managing Presets
87
+ - Edit, save, or delete presets directly from the web interface
88
+ - Presets include comments with usage instructions
89
+ - Share `.lset` files with other users
90
+
91
+ ## CausVid Lora (Special)
92
+
93
+ CausVid is a distilled Wan model that generates videos in 4-12 steps with 2x speed improvement.
94
+
95
+ ### Setup Instructions
96
+ 1. Download the CausVid Lora:
97
+ ```
98
+ https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
99
+ ```
100
+ 2. Place in your `loras/` directory
101
+
102
+ ### Usage
103
+ 1. Select a Wan t2v model (e.g., Wan 2.1 text2video 13B or Vace 13B)
104
+ 2. Enable Advanced Mode
105
+ 3. In Advanced Generation Tab:
106
+ - Set Guidance Scale = 1
107
+ - Set Shift Scale = 7
108
+ 4. In Advanced Lora Tab:
109
+ - Select CausVid Lora
110
+ - Set multiplier to 0.3
111
+ 5. Set generation steps to 12
112
+ 6. Generate!
113
+
114
+ ### CausVid Step/Multiplier Relationship
115
+ - **12 steps**: 0.3 multiplier (recommended)
116
+ - **8 steps**: 0.5-0.7 multiplier
117
+ - **4 steps**: 0.8-1.0 multiplier
118
+
119
+ *Note: Lower steps = lower quality (especially motion)*
120
+
121
+ ## Supported Formats
122
+
123
+ WanGP supports multiple lora formats:
124
+ - **Safetensors** (.safetensors)
125
+ - **Replicate** format
126
+ - **Standard PyTorch** (.pt, .pth)
127
+
128
+ ## Performance Tips
129
+
130
+ ### Fast Loading/Unloading
131
+ - Loras can be added/removed without restarting the app
132
+ - Use the "Refresh" button to detect new loras
133
+ - Enable `--check-loras` to filter incompatible loras (slower startup)
134
+
135
+ ### Memory Management
136
+ - Loras are loaded on-demand to save VRAM
137
+ - Multiple loras can be used simultaneously
138
+ - Time-based multipliers don't use extra memory
139
+
140
+ ## Finding Loras
141
+
142
+ ### Sources
143
+ - **[Civitai](https://civitai.com/)** - Large community collection
144
+ - **HuggingFace** - Official and community loras
145
+ - **Discord Server** - Community recommendations
146
+
147
+ ### Creating Loras
148
+ - **Kohya** - Popular training tool
149
+ - **OneTrainer** - Alternative training solution
150
+ - **Custom datasets** - Train on your own content
151
+
152
+ ## Macro System (Advanced)
153
+
154
+ Create multiple prompts from templates using macros:
155
+
156
+ ```
157
+ ! {Subject}="cat","woman","man", {Location}="forest","lake","city", {Possessive}="its","her","his"
158
+ In the video, a {Subject} is presented. The {Subject} is in a {Location} and looks at {Possessive} watch.
159
+ ```
160
+
161
+ This generates:
162
+ 1. "In the video, a cat is presented. The cat is in a forest and looks at its watch."
163
+ 2. "In the video, a woman is presented. The woman is in a lake and looks at her watch."
164
+ 3. "In the video, a man is presented. The man is in a city and looks at his watch."
165
+
166
+ ## Troubleshooting
167
+
168
+ ### Lora Not Working
169
+ 1. Check if lora is compatible with your model size (1.3B vs 14B)
170
+ 2. Verify lora format is supported
171
+ 3. Try different multiplier values
172
+ 4. Check the lora was trained for your model type (t2v vs i2v)
173
+
174
+ ### Performance Issues
175
+ 1. Reduce number of active loras
176
+ 2. Lower multiplier values
177
+ 3. Use `--check-loras` to filter incompatible files
178
+ 4. Clear lora cache if issues persist
179
+
180
+ ### Memory Errors
181
+ 1. Use fewer loras simultaneously
182
+ 2. Reduce model size (use 1.3B instead of 14B)
183
+ 3. Lower video resolution or frame count
184
+ 4. Enable quantization if not already active
185
+
186
+ ## Command Line Options
187
+
188
+ ```bash
189
+ # Lora-related command line options
190
+ --lora-dir path # Path to t2v loras directory
191
+ --lora-dir-i2v path # Path to i2v loras directory
192
+ --lora-dir-hunyuan path # Path to Hunyuan t2v loras
193
+ --lora-dir-hunyuan-i2v path # Path to Hunyuan i2v loras
194
+ --lora-dir-ltxv path # Path to LTX Video loras
195
+ --lora-preset preset # Load preset on startup
196
+ --check-loras # Filter incompatible loras
197
+ ```
docs/MODELS.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Models Overview
2
+
3
+ WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations.
4
+
5
+ ## Text-to-Video Models
6
+
7
+ ### Wan 2.1 Models
8
+
9
+ #### Wan 2.1 Text2Video 1.3B
10
+ - **Size**: 1.3 billion parameters
11
+ - **VRAM**: 6GB minimum
12
+ - **Speed**: Fast generation
13
+ - **Quality**: Good quality for the size
14
+ - **Best for**: Quick iterations, lower-end hardware
15
+ - **Command**: `python wgp.py --t2v-1-3B`
16
+
17
+ #### Wan 2.1 Text2Video 14B
18
+ - **Size**: 14 billion parameters
19
+ - **VRAM**: 12GB+ recommended
20
+ - **Speed**: Slower but higher quality
21
+ - **Quality**: Excellent detail and coherence
22
+ - **Best for**: Final production videos
23
+ - **Command**: `python wgp.py --t2v-14B`
24
+
25
+ #### Wan Vace 1.3B
26
+ - **Type**: ControlNet for advanced video control
27
+ - **VRAM**: 6GB minimum
28
+ - **Features**: Motion transfer, object injection, inpainting
29
+ - **Best for**: Advanced video manipulation
30
+ - **Command**: `python wgp.py --vace`
31
+
32
+ #### Wan Vace 14B
33
+ - **Type**: Large ControlNet model
34
+ - **VRAM**: 12GB+ recommended
35
+ - **Features**: All Vace features with higher quality
36
+ - **Best for**: Professional video editing workflows
37
+
38
+ ### Hunyuan Video Models
39
+
40
+ #### Hunyuan Video Text2Video
41
+ - **Quality**: Among the best open source t2v models
42
+ - **VRAM**: 12GB+ recommended
43
+ - **Speed**: Slower generation but excellent results
44
+ - **Features**: Superior text adherence and video quality
45
+ - **Best for**: High-quality text-to-video generation
46
+
47
+ #### Hunyuan Video Custom
48
+ - **Specialty**: Identity preservation
49
+ - **Use case**: Injecting specific people into videos
50
+ - **Quality**: Excellent for character consistency
51
+ - **Best for**: Character-focused video generation
52
+
53
+ ### LTX Video Models
54
+
55
+ #### LTX Video 13B
56
+ - **Specialty**: Long video generation
57
+ - **Resolution**: Fast 720p generation
58
+ - **VRAM**: Optimized by WanGP (4x reduction in requirements)
59
+ - **Best for**: Longer duration videos
60
+
61
+ #### LTX Video 13B Distilled
62
+ - **Speed**: Generate in less than one minute
63
+ - **Quality**: Very high quality despite speed
64
+ - **Best for**: Rapid prototyping and quick results
65
+
66
+ ### Other Models
67
+
68
+ #### Sky Reels v2
69
+ - **Type**: Diffusion Forcing model
70
+ - **Specialty**: "Infinite length" videos
71
+ - **Features**: High quality continuous generation
72
+ - **Note**: Uses causal attention (SDPA only)
73
+
74
+ #### MoviiGen (Experimental)
75
+ - **Resolution**: Claims 1080p capability
76
+ - **VRAM**: 20GB+ required
77
+ - **Speed**: Very slow generation
78
+ - **Status**: Experimental, feedback welcome
79
+
80
+ #### CausVid (Via Lora)
81
+ - **Type**: Distilled model (Lora implementation)
82
+ - **Speed**: 4-12 steps generation, 2x faster
83
+ - **Compatible**: Works with Wan 14B models
84
+ - **Setup**: Requires CausVid Lora (see [LORAS.md](LORAS.md))
85
+
86
+ ## Image-to-Video Models
87
+
88
+ ### Wan Fun InP Models
89
+
90
+ #### Wan Fun InP 1.3B
91
+ - **Size**: 1.3 billion parameters
92
+ - **VRAM**: 6GB minimum
93
+ - **Quality**: Good for the size, accessible to lower hardware
94
+ - **Best for**: Entry-level image animation
95
+ - **Command**: `python wgp.py --i2v-1-3B`
96
+
97
+ #### Wan Fun InP 14B
98
+ - **Size**: 14 billion parameters
99
+ - **VRAM**: 12GB+ recommended
100
+ - **Quality**: Better end image support
101
+ - **Limitation**: Existing loras don't work as well
102
+ - **Command**: `python wgp.py --i2v-14B`
103
+
104
+ ### Specialized Models
105
+
106
+ #### FantasySpeaking
107
+ - **Type**: Talking head animation
108
+ - **Input**: Voice track + image
109
+ - **Works on**: People and objects
110
+ - **Use case**: Lip-sync and voice-driven animation
111
+
112
+ #### Phantom
113
+ - **Type**: Person/object transfer
114
+ - **Resolution**: Works well at 720p
115
+ - **Requirements**: 30+ steps for good results
116
+ - **Best for**: Transferring subjects between videos
117
+
118
+ #### Recam Master
119
+ - **Type**: Viewpoint change
120
+ - **Requirements**: 81+ frame input videos, 15+ denoising steps
121
+ - **Use case**: View same scene from different angles
122
+
123
+ #### FLF2V
124
+ - **Type**: Start/end frame specialist
125
+ - **Resolution**: Optimized for 720p
126
+ - **Official**: Wan team supported
127
+ - **Use case**: Image-to-video with specific endpoints
128
+
129
+ ## Model Selection Guide
130
+
131
+ ### By Hardware (VRAM)
132
+
133
+ #### 6-8GB VRAM
134
+ - Wan 2.1 T2V 1.3B
135
+ - Wan Fun InP 1.3B
136
+ - Wan Vace 1.3B
137
+
138
+ #### 10-12GB VRAM
139
+ - Wan 2.1 T2V 14B
140
+ - Wan Fun InP 14B
141
+ - Hunyuan Video (with optimizations)
142
+ - LTX Video 13B
143
+
144
+ #### 16GB+ VRAM
145
+ - All models supported
146
+ - Longer videos possible
147
+ - Higher resolutions
148
+ - Multiple simultaneous Loras
149
+
150
+ #### 20GB+ VRAM
151
+ - MoviiGen (experimental 1080p)
152
+ - Very long videos
153
+ - Maximum quality settings
154
+
155
+ ### By Use Case
156
+
157
+ #### Quick Prototyping
158
+ 1. **LTX Video 13B Distilled** - Fastest, high quality
159
+ 2. **Wan 2.1 T2V 1.3B** - Fast, good quality
160
+ 3. **CausVid Lora** - 4-12 steps, very fast
161
+
162
+ #### Best Quality
163
+ 1. **Hunyuan Video** - Overall best t2v quality
164
+ 2. **Wan 2.1 T2V 14B** - Excellent Wan quality
165
+ 3. **Wan Vace 14B** - Best for controlled generation
166
+
167
+ #### Advanced Control
168
+ 1. **Wan Vace 14B/1.3B** - Motion transfer, object injection
169
+ 2. **Phantom** - Person/object transfer
170
+ 3. **FantasySpeaking** - Voice-driven animation
171
+
172
+ #### Long Videos
173
+ 1. **LTX Video 13B** - Specialized for length
174
+ 2. **Sky Reels v2** - Infinite length videos
175
+ 3. **Wan Vace + Sliding Windows** - Up to 1 minute
176
+
177
+ #### Lower Hardware
178
+ 1. **Wan Fun InP 1.3B** - Image-to-video
179
+ 2. **Wan 2.1 T2V 1.3B** - Text-to-video
180
+ 3. **Wan Vace 1.3B** - Advanced control
181
+
182
+ ## Performance Comparison
183
+
184
+ ### Speed (Relative)
185
+ 1. **CausVid Lora** (4-12 steps) - Fastest
186
+ 2. **LTX Video Distilled** - Very fast
187
+ 3. **Wan 1.3B models** - Fast
188
+ 4. **Wan 14B models** - Medium
189
+ 5. **Hunyuan Video** - Slower
190
+ 6. **MoviiGen** - Slowest
191
+
192
+ ### Quality (Subjective)
193
+ 1. **Hunyuan Video** - Highest overall
194
+ 2. **Wan 14B models** - Excellent
195
+ 3. **LTX Video models** - Very good
196
+ 4. **Wan 1.3B models** - Good
197
+ 5. **CausVid** - Good (varies with steps)
198
+
199
+ ### VRAM Efficiency
200
+ 1. **Wan 1.3B models** - Most efficient
201
+ 2. **LTX Video** (with WanGP optimizations)
202
+ 3. **Wan 14B models**
203
+ 4. **Hunyuan Video**
204
+ 5. **MoviiGen** - Least efficient
205
+
206
+ ## Model Switching
207
+
208
+ WanGP allows switching between models without restarting:
209
+
210
+ 1. Use the dropdown menu in the web interface
211
+ 2. Models are loaded on-demand
212
+ 3. Previous model is unloaded to save VRAM
213
+ 4. Settings are preserved when possible
214
+
215
+ ## Tips for Model Selection
216
+
217
+ ### First Time Users
218
+ Start with **Wan 2.1 T2V 1.3B** to learn the interface and test your hardware.
219
+
220
+ ### Production Work
221
+ Use **Hunyuan Video** or **Wan 14B** models for final output quality.
222
+
223
+ ### Experimentation
224
+ **CausVid Lora** or **LTX Distilled** for rapid iteration and testing.
225
+
226
+ ### Specialized Tasks
227
+ - **VACE** for advanced control
228
+ - **FantasySpeaking** for talking heads
229
+ - **LTX Video** for long sequences
230
+
231
+ ### Hardware Optimization
232
+ Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs.
docs/TROUBLESHOOTING.md ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Troubleshooting Guide
2
+
3
+ This guide covers common issues and their solutions when using WanGP.
4
+
5
+ ## Installation Issues
6
+
7
+ ### PyTorch Installation Problems
8
+
9
+ #### CUDA Version Mismatch
10
+ **Problem**: PyTorch can't detect GPU or CUDA errors
11
+ **Solution**:
12
+ ```bash
13
+ # Check your CUDA version
14
+ nvidia-smi
15
+
16
+ # Install matching PyTorch version
17
+ # For CUDA 12.4 (RTX 10XX-40XX)
18
+ pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124
19
+
20
+ # For CUDA 12.8 (RTX 50XX)
21
+ pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
22
+ ```
23
+
24
+ #### Python Version Issues
25
+ **Problem**: Package compatibility errors
26
+ **Solution**: Ensure you're using Python 3.10.9
27
+ ```bash
28
+ python --version # Should show 3.10.9
29
+ conda create -n wan2gp python=3.10.9
30
+ ```
31
+
32
+ ### Dependency Installation Failures
33
+
34
+ #### Triton Installation (Windows)
35
+ **Problem**: `pip install triton-windows` fails
36
+ **Solution**:
37
+ 1. Update pip: `pip install --upgrade pip`
38
+ 2. Try pre-compiled wheel
39
+ 3. Fallback to SDPA attention: `python wgp.py --attention sdpa`
40
+
41
+ #### SageAttention Compilation Issues
42
+ **Problem**: SageAttention installation fails
43
+ **Solution**:
44
+ 1. Install Visual Studio Build Tools (Windows)
45
+ 2. Use pre-compiled wheels when available
46
+ 3. Fallback to basic attention modes
47
+
48
+ ## Memory Issues
49
+
50
+ ### CUDA Out of Memory
51
+
52
+ #### During Model Loading
53
+ **Problem**: "CUDA out of memory" when loading model
54
+ **Solutions**:
55
+ ```bash
56
+ # Use smaller model
57
+ python wgp.py --t2v-1-3B
58
+
59
+ # Enable quantization (usually default)
60
+ python wgp.py --quantize-transformer True
61
+
62
+ # Use memory-efficient profile
63
+ python wgp.py --profile 4
64
+
65
+ # Reduce preloaded model size
66
+ python wgp.py --preload 0
67
+ ```
68
+
69
+ #### During Video Generation
70
+ **Problem**: Memory error during generation
71
+ **Solutions**:
72
+ 1. Reduce frame count (shorter videos)
73
+ 2. Lower resolution in advanced settings
74
+ 3. Use lower batch size
75
+ 4. Clear GPU cache between generations
76
+
77
+ ### System RAM Issues
78
+
79
+ #### High RAM Usage
80
+ **Problem**: System runs out of RAM
81
+ **Solutions**:
82
+ ```bash
83
+ # Limit reserved memory
84
+ python wgp.py --perc-reserved-mem-max 0.3
85
+
86
+ # Use minimal RAM profile
87
+ python wgp.py --profile 5
88
+
89
+ # Enable swap file (OS level)
90
+ ```
91
+
92
+ ## Performance Issues
93
+
94
+ ### Slow Generation Speed
95
+
96
+ #### General Optimization
97
+ ```bash
98
+ # Enable compilation (requires Triton)
99
+ python wgp.py --compile
100
+
101
+ # Use faster attention
102
+ python wgp.py --attention sage2
103
+
104
+ # Enable TeaCache
105
+ python wgp.py --teacache 2.0
106
+
107
+ # Use high-performance profile
108
+ python wgp.py --profile 3
109
+ ```
110
+
111
+ #### GPU-Specific Optimizations
112
+
113
+ **RTX 10XX/20XX Series**:
114
+ ```bash
115
+ python wgp.py --attention sdpa --profile 4 --teacache 1.5
116
+ ```
117
+
118
+ **RTX 30XX/40XX Series**:
119
+ ```bash
120
+ python wgp.py --compile --attention sage --profile 3 --teacache 2.0
121
+ ```
122
+
123
+ **RTX 50XX Series**:
124
+ ```bash
125
+ python wgp.py --attention sage --profile 4 --fp16
126
+ ```
127
+
128
+ ### Attention Mechanism Issues
129
+
130
+ #### Sage Attention Not Working
131
+ **Problem**: Sage attention fails to compile or work
132
+ **Diagnostic Steps**:
133
+ 1. Check Triton installation:
134
+ ```python
135
+ import triton
136
+ print(triton.__version__)
137
+ ```
138
+ 2. Clear Triton cache:
139
+ ```bash
140
+ # Windows
141
+ rmdir /s %USERPROFILE%\.triton
142
+ # Linux
143
+ rm -rf ~/.triton
144
+ ```
145
+ 3. Fallback solution:
146
+ ```bash
147
+ python wgp.py --attention sdpa
148
+ ```
149
+
150
+ #### Flash Attention Issues
151
+ **Problem**: Flash attention compilation fails
152
+ **Solution**:
153
+ - Windows: Often requires manual CUDA kernel compilation
154
+ - Linux: Usually works with `pip install flash-attn`
155
+ - Fallback: Use Sage or SDPA attention
156
+
157
+ ## Model-Specific Issues
158
+
159
+ ### Lora Problems
160
+
161
+ #### Loras Not Loading
162
+ **Problem**: Loras don't appear in the interface
163
+ **Solutions**:
164
+ 1. Check file format (should be .safetensors, .pt, or .pth)
165
+ 2. Verify correct directory:
166
+ ```
167
+ loras/ # For t2v models
168
+ loras_i2v/ # For i2v models
169
+ loras_hunyuan/ # For Hunyuan models
170
+ ```
171
+ 3. Click "Refresh" button in interface
172
+ 4. Use `--check-loras` to filter incompatible files
173
+
174
+ #### Lora Compatibility Issues
175
+ **Problem**: Lora causes errors or poor results
176
+ **Solutions**:
177
+ 1. Check model size compatibility (1.3B vs 14B)
178
+ 2. Verify lora was trained for your model type
179
+ 3. Try different multiplier values
180
+ 4. Use `--check-loras` flag to auto-filter
181
+
182
+ ### VACE-Specific Issues
183
+
184
+ #### Poor VACE Results
185
+ **Problem**: VACE generates poor quality or unexpected results
186
+ **Solutions**:
187
+ 1. Enable Skip Layer Guidance
188
+ 2. Use detailed prompts describing all elements
189
+ 3. Ensure proper mask creation with Matanyone
190
+ 4. Check reference image quality
191
+ 5. Use at least 15 steps, preferably 30+
192
+
193
+ #### Matanyone Tool Issues
194
+ **Problem**: Mask creation difficulties
195
+ **Solutions**:
196
+ 1. Use negative point prompts to refine selection
197
+ 2. Create multiple sub-masks and combine them
198
+ 3. Try different background removal options
199
+ 4. Ensure sufficient contrast in source video
200
+
201
+ ## Network and Server Issues
202
+
203
+ ### Gradio Interface Problems
204
+
205
+ #### Port Already in Use
206
+ **Problem**: "Port 7860 is already in use"
207
+ **Solution**:
208
+ ```bash
209
+ # Use different port
210
+ python wgp.py --server-port 7861
211
+
212
+ # Or kill existing process
213
+ # Windows
214
+ netstat -ano | findstr :7860
215
+ taskkill /PID <PID> /F
216
+
217
+ # Linux
218
+ lsof -i :7860
219
+ kill <PID>
220
+ ```
221
+
222
+ #### Interface Not Loading
223
+ **Problem**: Browser shows "connection refused"
224
+ **Solutions**:
225
+ 1. Check if server started successfully
226
+ 2. Try `http://127.0.0.1:7860` instead of `localhost:7860`
227
+ 3. Disable firewall temporarily
228
+ 4. Use `--listen` flag for network access
229
+
230
+ ### Remote Access Issues
231
+
232
+ #### Sharing Not Working
233
+ **Problem**: `--share` flag doesn't create public URL
234
+ **Solutions**:
235
+ 1. Check internet connection
236
+ 2. Try different network
237
+ 3. Use `--listen` with port forwarding
238
+ 4. Check firewall settings
239
+
240
+ ## Quality Issues
241
+
242
+ ### Poor Video Quality
243
+
244
+ #### General Quality Improvements
245
+ 1. Increase number of steps (25-30+)
246
+ 2. Use larger models (14B instead of 1.3B)
247
+ 3. Enable Skip Layer Guidance
248
+ 4. Improve prompt descriptions
249
+ 5. Use higher resolution settings
250
+
251
+ #### Specific Quality Issues
252
+
253
+ **Blurry Videos**:
254
+ - Increase steps
255
+ - Check source image quality (i2v)
256
+ - Reduce TeaCache multiplier
257
+ - Use higher guidance scale
258
+
259
+ **Inconsistent Motion**:
260
+ - Use longer overlap in sliding windows
261
+ - Reduce window size
262
+ - Improve prompt consistency
263
+ - Check control video quality (VACE)
264
+
265
+ **Color Issues**:
266
+ - Check model compatibility
267
+ - Adjust guidance scale
268
+ - Verify input image color space
269
+ - Try different VAE settings
270
+
271
+ ## Advanced Debugging
272
+
273
+ ### Enable Verbose Output
274
+ ```bash
275
+ # Maximum verbosity
276
+ python wgp.py --verbose 2
277
+
278
+ # Check lora compatibility
279
+ python wgp.py --check-loras --verbose 2
280
+ ```
281
+
282
+ ### Memory Debugging
283
+ ```bash
284
+ # Monitor GPU memory
285
+ nvidia-smi -l 1
286
+
287
+ # Reduce memory usage
288
+ python wgp.py --profile 4 --perc-reserved-mem-max 0.2
289
+ ```
290
+
291
+ ### Performance Profiling
292
+ ```bash
293
+ # Test different configurations
294
+ python wgp.py --attention sdpa --profile 4 # Baseline
295
+ python wgp.py --attention sage --profile 3 # Performance
296
+ python wgp.py --compile --teacache 2.0 # Maximum speed
297
+ ```
298
+
299
+ ## Getting Help
300
+
301
+ ### Before Asking for Help
302
+ 1. Check this troubleshooting guide
303
+ 2. Read the relevant documentation:
304
+ - [Installation Guide](INSTALLATION.md)
305
+ - [Getting Started](GETTING_STARTED.md)
306
+ - [Command Line Reference](CLI.md)
307
+ 3. Try basic fallback configuration:
308
+ ```bash
309
+ python wgp.py --attention sdpa --profile 4
310
+ ```
311
+
312
+ ### Community Support
313
+ - **Discord Server**: https://discord.gg/g7efUW9jGV
314
+ - Provide relevant information:
315
+ - GPU model and VRAM amount
316
+ - Python and PyTorch versions
317
+ - Complete error messages
318
+ - Command used to launch WanGP
319
+ - Operating system
320
+
321
+ ### Reporting Bugs
322
+ When reporting issues:
323
+ 1. Include system specifications
324
+ 2. Provide complete error logs
325
+ 3. List the exact steps to reproduce
326
+ 4. Mention any modifications to default settings
327
+ 5. Include command line arguments used
328
+
329
+ ## Emergency Fallback
330
+
331
+ If nothing works, try this minimal configuration:
332
+ ```bash
333
+ # Absolute minimum setup
334
+ python wgp.py --t2v-1-3B --attention sdpa --profile 4 --teacache 0 --fp16
335
+
336
+ # If that fails, check basic PyTorch installation
337
+ python -c "import torch; print(torch.cuda.is_available())"
338
+ ```
docs/VACE.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VACE ControlNet Guide
2
+
3
+ VACE is a powerful ControlNet that enables Video-to-Video and Reference-to-Video generation. It allows you to inject your own images into output videos, animate characters, perform inpainting/outpainting, and continue videos.
4
+
5
+ ## Overview
6
+
7
+ VACE is probably one of the most powerful Wan models available. With it, you can:
8
+ - Inject people or objects into scenes
9
+ - Animate characters
10
+ - Perform video inpainting and outpainting
11
+ - Continue existing videos
12
+ - Transfer motion from one video to another
13
+ - Change the style of scenes while preserving depth
14
+
15
+ ## Getting Started
16
+
17
+ ### Model Selection
18
+ 1. Select either "Vace 1.3B" or "Vace 13B" from the dropdown menu
19
+ 2. Note: VACE works best with videos up to 7 seconds with the Riflex option enabled
20
+
21
+ ### Input Types
22
+
23
+ VACE accepts three types of visual hints (which can be combined):
24
+
25
+ #### 1. Control Video
26
+ - Transfer motion or depth to a new video
27
+ - Use only the first n frames and extrapolate the rest
28
+ - Perform inpainting with grey color (127) as mask areas
29
+ - Grey areas will be filled based on text prompt and reference images
30
+
31
+ #### 2. Reference Images
32
+ - Use as background/setting for the video
33
+ - Inject people or objects of your choice
34
+ - Select multiple reference images
35
+ - **Tip**: Replace complex backgrounds with white for better object integration
36
+ - Always describe injected objects/people explicitly in your text prompt
37
+
38
+ #### 3. Video Mask
39
+ - Stronger control over which parts to keep (black) or replace (white)
40
+ - Perfect for inpainting/outpainting
41
+ - Example: White mask except at beginning/end (black) keeps first/last frames while generating middle content
42
+
43
+ ## Common Use Cases
44
+
45
+ ### Motion Transfer
46
+ **Goal**: Animate a character of your choice using motion from another video
47
+ **Setup**:
48
+ - Reference Images: Your character
49
+ - Control Video: Person performing desired motion
50
+ - Text Prompt: Describe your character and the action
51
+
52
+ ### Object/Person Injection
53
+ **Goal**: Insert people or objects into a scene
54
+ **Setup**:
55
+ - Reference Images: The people/objects to inject
56
+ - Text Prompt: Describe the scene and explicitly mention the injected elements
57
+
58
+ ### Character Animation
59
+ **Goal**: Animate a character based on text description
60
+ **Setup**:
61
+ - Control Video: Video of person moving
62
+ - Text Prompt: Detailed description of your character
63
+
64
+ ### Style Transfer with Depth
65
+ **Goal**: Change scene style while preserving spatial relationships
66
+ **Setup**:
67
+ - Control Video: Original video (for depth information)
68
+ - Text Prompt: New style description
69
+
70
+ ## Integrated Matanyone Tool
71
+
72
+ WanGP includes the Matanyone tool, specifically tuned for VACE workflows. This helps create control videos and masks simultaneously.
73
+
74
+ ### Creating Face Replacement Masks
75
+ 1. Load your video in Matanyone
76
+ 2. Click on the face in the first frame
77
+ 3. Create a mask for the face
78
+ 4. Generate both control video and mask video with "Generate Video Matting"
79
+ 5. Export to VACE with "Export to current Video Input and Video Mask"
80
+ 6. Load replacement face image in Reference Images field
81
+
82
+ ### Advanced Matanyone Tips
83
+ - **Negative Point Prompts**: Remove parts from current selection
84
+ - **Sub Masks**: Create multiple independent masks, then combine them
85
+ - **Background Masks**: Select everything except the character (useful for background replacement)
86
+ - Enable/disable sub masks in Matanyone settings
87
+
88
+ ## Recommended Settings
89
+
90
+ ### Quality Settings
91
+ - **Skip Layer Guidance**: Turn ON with default configuration for better results
92
+ - **Long Prompts**: Use detailed descriptions, especially for background elements not in reference images
93
+ - **Steps**: Use at least 15 steps for good quality, 30+ for best results
94
+
95
+ ### Sliding Window Settings
96
+ For very long videos, configure sliding windows properly:
97
+
98
+ - **Window Size**: Set appropriate duration for your content
99
+ - **Overlap Frames**: Long enough for motion continuity, short enough to avoid blur propagation
100
+ - **Discard Last Frames**: Remove at least 4 frames from each window (VACE 1.3B tends to blur final frames)
101
+
102
+ ### Background Removal
103
+ VACE includes automatic background removal options:
104
+ - Use for reference images containing people/objects
105
+ - **Don't use** for landscape/setting reference images (first reference image)
106
+ - Multiple background removal types available
107
+
108
+ ## Window Sliding for Long Videos
109
+
110
+ Generate videos up to 1 minute by merging multiple windows:
111
+
112
+ ### How It Works
113
+ - Each window uses corresponding time segment from control video
114
+ - Example: 0-4s control video → first window, 4-8s → second window, etc.
115
+ - Automatic overlap management ensures smooth transitions
116
+
117
+ ### Settings
118
+ - **Window Size**: Duration of each generation window
119
+ - **Overlap Frames**: Frames shared between windows for continuity
120
+ - **Discard Last Frames**: Remove poor-quality ending frames
121
+ - **Add Overlapped Noise**: Reduce quality degradation over time
122
+
123
+ ### Formula
124
+ ```
125
+ Generated Frames = [Windows - 1] × [Window Size - Overlap - Discard] + Window Size
126
+ ```
127
+
128
+ ### Multi-Line Prompts (Experimental)
129
+ - Each line of prompt used for different window
130
+ - If more windows than prompt lines, last line repeats
131
+ - Separate lines with carriage return
132
+
133
+ ## Advanced Features
134
+
135
+ ### Extend Video
136
+ Click "Extend the Video Sample, Please!" during generation to add more windows dynamically.
137
+
138
+ ### Noise Addition
139
+ Add noise to overlapped frames to hide accumulated errors and quality degradation.
140
+
141
+ ### Frame Truncation
142
+ Automatically remove lower-quality final frames from each window (recommended: 4 frames for VACE 1.3B).
143
+
144
+ ## External Resources
145
+
146
+ ### Official VACE Resources
147
+ - **GitHub**: https://github.com/ali-vilab/VACE/tree/main/vace/gradios
148
+ - **User Guide**: https://github.com/ali-vilab/VACE/blob/main/UserGuide.md
149
+ - **Preprocessors**: Gradio tools for preparing materials
150
+
151
+ ### Recommended External Tools
152
+ - **Annotation Tools**: For creating precise masks
153
+ - **Video Editors**: For preparing control videos
154
+ - **Background Removal**: For cleaning reference images
155
+
156
+ ## Troubleshooting
157
+
158
+ ### Poor Quality Results
159
+ 1. Use longer, more detailed prompts
160
+ 2. Enable Skip Layer Guidance
161
+ 3. Increase number of steps (30+)
162
+ 4. Check reference image quality
163
+ 5. Ensure proper mask creation
164
+
165
+ ### Inconsistent Windows
166
+ 1. Increase overlap frames
167
+ 2. Use consistent prompting across windows
168
+ 3. Add noise to overlapped frames
169
+ 4. Reduce discard frames if losing too much content
170
+
171
+ ### Memory Issues
172
+ 1. Use VACE 1.3B instead of 13B
173
+ 2. Reduce video length or resolution
174
+ 3. Decrease window size
175
+ 4. Enable quantization
176
+
177
+ ### Blurry Results
178
+ 1. Reduce overlap frames
179
+ 2. Increase discard last frames
180
+ 3. Use higher resolution reference images
181
+ 4. Check control video quality
182
+
183
+ ## Tips for Best Results
184
+
185
+ 1. **Detailed Prompts**: Describe everything in the scene, especially elements not in reference images
186
+ 2. **Quality Reference Images**: Use high-resolution, well-lit reference images
187
+ 3. **Proper Masking**: Take time to create precise masks with Matanyone
188
+ 4. **Iterative Approach**: Start with short videos, then extend successful results
189
+ 5. **Background Preparation**: Remove complex backgrounds from object/person reference images
190
+ 6. **Consistent Lighting**: Match lighting between reference images and intended scene