DFloat11 Compressed Model: Wan-AI/Wan2.2-I2V-A14B
This is a DFloat11 losslessly compressed version of the original Wan-AI/Wan2.2-I2V-A14B
model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.
๐ฅ๐ฅ๐ฅ Thanks to DFloat11 compression, Wan-AI/Wan2.2-I2V-A14B
can now generate a 5-second 720P video on a single 24GB GPU, while maintaining full model quality. ๐ฅ๐ฅ๐ฅ
๐ Performance Comparison
Model | Model Size | Peak GPU Memory (5-second 720P generation) | Generation Time (A100 GPU) |
---|---|---|---|
Wan-AI/Wan2.2-I2V-A14B (BFloat16) | ~56 GB | O.O.M. | - |
Wan-AI/Wan2.2-I2V-A14B (DFloat11) | 19.47 + 19.44 GB | 29.12 GB | 42 minutes |
Wan-AI/Wan2.2-I2V-A14B (DFloat11 + CPU Offloading) | 19.47 + 19.44 GB | 20.01 GB | 44 minutes |
๐ How It Works
We apply Huffman coding to the exponent bits of BFloat16 model weights, which are highly compressible. We leverage hardware-aware algorithmic designs to enable highly efficient, on-the-fly weight decompression directly on the GPU. Find out more in our research paper.
๐ง How to Use
Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):
pip install -U dfloat11[cuda12]
Install the latest
diffusers
package from source:pip install git+https://github.com/huggingface/diffusers
Save the following code to a Python file
i2v.py
:import time import torch import numpy as np import argparse from diffusers import WanImageToVideoPipeline from diffusers.utils import export_to_video, load_image from dfloat11 import DFloat11Model parser = argparse.ArgumentParser(description='Image to Video generation using Wan2.2-I2V model') parser.add_argument('--cpu_offload', action='store_true', help='Enable CPU offloading') parser.add_argument('--image_path', type=str, default="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG", help='Path or URL to the input image') parser.add_argument('--width', type=int, default=1280, help='Output video width') parser.add_argument('--height', type=int, default=720, help='Output video height') parser.add_argument('--prompt', type=str, default="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.", help='Prompt for video generation') parser.add_argument('--negative_prompt', type=str, default="่ฒ่ฐ่ณไธฝ๏ผ่ฟๆ๏ผ้ๆ๏ผ็ป่ๆจก็ณไธๆธ ๏ผๅญๅน๏ผ้ฃๆ ผ๏ผไฝๅ๏ผ็ปไฝ๏ผ็ป้ข๏ผ้ๆญข๏ผๆดไฝๅ็ฐ๏ผๆๅทฎ่ดจ้๏ผไฝ่ดจ้๏ผJPEGๅ็ผฉๆฎ็๏ผไธ้็๏ผๆฎ็ผบ็๏ผๅคไฝ็ๆๆ๏ผ็ปๅพไธๅฅฝ็ๆ้จ๏ผ็ปๅพไธๅฅฝ็่ธ้จ๏ผ็ธๅฝข็๏ผๆฏๅฎน็๏ผๅฝขๆ็ธๅฝข็่ขไฝ๏ผๆๆ่ๅ๏ผ้ๆญขไธๅจ็็ป้ข๏ผๆไนฑ็่ๆฏ๏ผไธๆก่ ฟ๏ผ่ๆฏไบบๅพๅค๏ผๅ็่ตฐ", help='Negative prompt for video generation') parser.add_argument('--num_frames', type=int, default=81, help='Number of frames to generate') parser.add_argument('--guidance_scale', type=float, default=3.5, help='Guidance scale for generation') parser.add_argument('--num_inference_steps', type=int, default=40, help='Number of inference steps') parser.add_argument('--seed', type=int, default=42, help='Random seed for generation') parser.add_argument('--output', type=str, default='i2v_output.mp4', help='Output video path') parser.add_argument('--fps', type=int, default=16, help='FPS of output video') args = parser.parse_args() image = load_image(args.image_path) pipe = WanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers", torch_dtype=torch.bfloat16) DFloat11Model.from_pretrained( "DFloat11/Wan2.2-I2V-A14B-DF11", device="cpu", cpu_offload=args.cpu_offload, bfloat16_model=pipe.transformer, ) DFloat11Model.from_pretrained( "DFloat11/Wan2.2-I2V-A14B-2-DF11", device="cpu", cpu_offload=args.cpu_offload, bfloat16_model=pipe.transformer_2, ) pipe.enable_model_cpu_offload() max_area = args.width * args.height aspect_ratio = image.height / image.width mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1] height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value image = image.resize((width, height)) generator = torch.Generator(device="cuda").manual_seed(args.seed) start_time = time.time() output = pipe( image=image, prompt=args.prompt, negative_prompt=args.negative_prompt, height=height, width=width, num_frames=args.num_frames, guidance_scale=args.guidance_scale, num_inference_steps=args.num_inference_steps, generator=generator, ).frames[0] print(f"Time taken: {time.time() - start_time:.2f} seconds") export_to_video(output, args.output, fps=args.fps) max_memory = torch.cuda.max_memory_allocated() print(f"Max memory: {max_memory / (1000 ** 3):.2f} GB")
To run without CPU offloading (40GB VRAM required):
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py
To run with CPU offloading (22.5GB VRAM required):
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py --cpu_offload
Setting
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
is strongly recommended to prevent out-of-memory errors caused by GPU memory fragmentation.
๐ Learn More
- Downloads last month
- 1,252
Model tree for DFloat11/Wan2.2-I2V-A14B-2-DF11
Base model
Wan-AI/Wan2.2-I2V-A14B-Diffusers