How did you add VACE capabilities to the Wan2.2 base model?
Hi I'm curious how you added on VACE to the base model, are you going to open source the process? I've seen models like self-forcing and self-forcing VACE pop up unexpectedly, but I'm interested in how to engineer this sort of stuff so that I can contribute myself. Also is there/will there be an update on the infinite/long-form video extension w/o quality degradation?
Hi I'm curious how you added on VACE to the base model, are you going to open source the process? I've seen models like self-forcing and self-forcing VACE pop up unexpectedly, but I'm interested in how to engineer this sort of stuff so that I can contribute myself. Also is there/will there be an update on the infinite/long-form video extension w/o quality degradation?
hello, try reading more here from the people who have tried it....
they mention LOWER QUALITY, just a test model/.///
https://www.reddit.com/r/StableDiffusion/comments/1mbrssd/wan_22_vace_experimental_is_out/
Hi I'm curious how you added on VACE to the base model, are you going to open source the process? I've seen models like self-forcing and self-forcing VACE pop up unexpectedly, but I'm interested in how to engineer this sort of stuff so that I can contribute myself. Also is there/will there be an update on the infinite/long-form video extension w/o quality degradation?
hello, try reading more here from the people who have tried it....
they mention LOWER QUALITY, just a test model/.///
https://www.reddit.com/r/StableDiffusion/comments/1mbrssd/wan_22_vace_experimental_is_out/
Hi I know it's only a test model, but I am curious how they did it in terms of implementation, for example, did they just take the original vace layers (with those weights) and add them in to sorta to match the previous wan2.1 architecture, and if so did they do a little finetuning to improve consistency so it works better with the new moe architecture. There's a lots of things that I would like to know about the practical ways they tried to do wan vace with 2.2
Hi @EladofWar ,
Thanks for the questions.
The process is documented in the model card or README.md, but here’s a more detailed breakdown for clarity.
How VACE is Added to the Base Model
- Started by experimenting with extracting/injecting VACE scopes using Python scripts from wsbagnsv1.
The scripts
Locate VACE-related tensors (e.g., vace_blocks, vace_patch_embedding).
Save them separately.
Inject them into the Wan2.2 14B T2V models.
This works because of how VACE is designed (reference):
The VACE model has only undergone
Context Adapter Tuning
and has not made any parameter changes to the original Wan2.1-T2V-1.3B/14B.
This design converges more quickly without compromising the capabilities of the original base model, which is beneficial for all subsequent plugin-type functionalities based on the original T2V model within the community.
The Hugging Face library (Wan2.1-VACE-14B) shows that the loaded model consists of [original T2V parameters(part 00001~00006) +VACE module parameters
(diffusion_pytorch_model-00007-of-00007.safetensors)], mainly for convenience in loading and code reusability considerations.
Alternative Approaches
Using Kijai’s WanVideoWrapper nodes: Loading the VACE module at runtime (example workflow).
For GGUFs, the process could be simplified using ComfyUI nodes (e.g., combining the VACE modules using model merge node and quantizing with ComfyUI-ModelQuantizer). I haven’t tested this yet myself since I’m more used to the Python script method.
Community & Resources
If you haven't already, it's highly recommended to join the Banodoco Discord:
Direct invite link: https://discord.com/invite/acg8aNBTxd (might expire)
Or get the latest link from banodoco.ai
It's a very active community with many contributors and lots of shared works/researches.
VACE Video & Quality Degradation
There're improvements planned for the color shifting and degradation issues.
Best way to help: Provide feedback and examples to the VACE team, issue tracker.