DFloat11 Compressed Model: Wan-AI/Wan2.2-I2V-A14B

This is a DFloat11 losslessly compressed version of the original Wan-AI/Wan2.2-I2V-A14B model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ Thanks to DFloat11 compression, Wan-AI/Wan2.2-I2V-A14B can now generate a 5-second 720P video on a single 24GB GPU, while maintaining full model quality. ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

๐Ÿ“Š Performance Comparison

Model Model Size Peak GPU Memory (5-second 720P generation) Generation Time (A100 GPU)
Wan-AI/Wan2.2-I2V-A14B (BFloat16) ~56 GB O.O.M. -
Wan-AI/Wan2.2-I2V-A14B (DFloat11) 19.47 + 19.44 GB 29.12 GB 42 minutes
Wan-AI/Wan2.2-I2V-A14B (DFloat11 + CPU Offloading) 19.47 + 19.44 GB 20.01 GB 44 minutes

๐Ÿ” How It Works

We apply Huffman coding to the exponent bits of BFloat16 model weights, which are highly compressible. We leverage hardware-aware algorithmic designs to enable highly efficient, on-the-fly weight decompression directly on the GPU. Find out more in our research paper.

๐Ÿ”ง How to Use

  1. Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):

    pip install -U dfloat11[cuda12]
    
  2. Install the latest diffusers package from source:

    pip install git+https://github.com/huggingface/diffusers
    
  3. Save the following code to a Python file i2v.py:

    import time
    import torch
    import numpy as np
    import argparse
    from diffusers import WanImageToVideoPipeline
    from diffusers.utils import export_to_video, load_image
    from dfloat11 import DFloat11Model
    
    parser = argparse.ArgumentParser(description='Image to Video generation using Wan2.2-I2V model')
    parser.add_argument('--cpu_offload', action='store_true', help='Enable CPU offloading')
    parser.add_argument('--image_path', type=str, default="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG", help='Path or URL to the input image')
    parser.add_argument('--width', type=int, default=1280, help='Output video width')
    parser.add_argument('--height', type=int, default=720, help='Output video height')
    parser.add_argument('--prompt', type=str, default="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.", help='Prompt for video generation')
    parser.add_argument('--negative_prompt', type=str, default="่‰ฒ่ฐƒ่‰ณไธฝ๏ผŒ่ฟ‡ๆ›๏ผŒ้™ๆ€๏ผŒ็ป†่Š‚ๆจก็ณŠไธๆธ…๏ผŒๅญ—ๅน•๏ผŒ้ฃŽๆ ผ๏ผŒไฝœๅ“๏ผŒ็”ปไฝœ๏ผŒ็”ป้ข๏ผŒ้™ๆญข๏ผŒๆ•ดไฝ“ๅ‘็ฐ๏ผŒๆœ€ๅทฎ่ดจ้‡๏ผŒไฝŽ่ดจ้‡๏ผŒJPEGๅŽ‹็ผฉๆฎ‹็•™๏ผŒไธ‘้™‹็š„๏ผŒๆฎ‹็ผบ็š„๏ผŒๅคšไฝ™็š„ๆ‰‹ๆŒ‡๏ผŒ็”ปๅพ—ไธๅฅฝ็š„ๆ‰‹้ƒจ๏ผŒ็”ปๅพ—ไธๅฅฝ็š„่„ธ้ƒจ๏ผŒ็•ธๅฝข็š„๏ผŒๆฏๅฎน็š„๏ผŒๅฝขๆ€็•ธๅฝข็š„่‚ขไฝ“๏ผŒๆ‰‹ๆŒ‡่žๅˆ๏ผŒ้™ๆญขไธๅŠจ็š„็”ป้ข๏ผŒๆ‚ไนฑ็š„่ƒŒๆ™ฏ๏ผŒไธ‰ๆก่…ฟ๏ผŒ่ƒŒๆ™ฏไบบๅพˆๅคš๏ผŒๅ€’็€่ตฐ", help='Negative prompt for video generation')
    parser.add_argument('--num_frames', type=int, default=81, help='Number of frames to generate')
    parser.add_argument('--guidance_scale', type=float, default=3.5, help='Guidance scale for generation')
    parser.add_argument('--num_inference_steps', type=int, default=40, help='Number of inference steps')
    parser.add_argument('--seed', type=int, default=42, help='Random seed for generation')
    parser.add_argument('--output', type=str, default='i2v_output.mp4', help='Output video path')
    parser.add_argument('--fps', type=int, default=16, help='FPS of output video')
    
    args = parser.parse_args()
    
    image = load_image(args.image_path)
    
    pipe = WanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers", torch_dtype=torch.bfloat16)
    
    DFloat11Model.from_pretrained(
        "DFloat11/Wan2.2-I2V-A14B-DF11",
        device="cpu",
        cpu_offload=args.cpu_offload,
        bfloat16_model=pipe.transformer,
    )
    DFloat11Model.from_pretrained(
        "DFloat11/Wan2.2-I2V-A14B-2-DF11",
        device="cpu",
        cpu_offload=args.cpu_offload,
        bfloat16_model=pipe.transformer_2,
    )
    
    pipe.enable_model_cpu_offload()
    
    max_area = args.width * args.height
    aspect_ratio = image.height / image.width
    mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
    height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
    width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
    image = image.resize((width, height))
    
    generator = torch.Generator(device="cuda").manual_seed(args.seed)
    
    start_time = time.time()
    output = pipe(
        image=image,
        prompt=args.prompt,
        negative_prompt=args.negative_prompt,
        height=height,
        width=width,
        num_frames=args.num_frames,
        guidance_scale=args.guidance_scale,
        num_inference_steps=args.num_inference_steps,
        generator=generator,
    ).frames[0]
    print(f"Time taken: {time.time() - start_time:.2f} seconds")
    
    export_to_video(output, args.output, fps=args.fps)
    
    max_memory = torch.cuda.max_memory_allocated()
    print(f"Max memory: {max_memory / (1000 ** 3):.2f} GB")
    
  4. To run without CPU offloading (40GB VRAM required):

    PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py
    

    To run with CPU offloading (22.5GB VRAM required):

    PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py --cpu_offload
    

    Setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True is strongly recommended to prevent out-of-memory errors caused by GPU memory fragmentation.

๐Ÿ“„ Learn More

Downloads last month
1,252
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DFloat11/Wan2.2-I2V-A14B-2-DF11

Quantized
(4)
this model