Wan 2.1 W4A16 INT4 Quantized Model

This is a W4A16 INT4 quantized version of the Wan-AI/Wan2.1-T2V-14B-Diffusers model, compressed using the ViDiT-Q quantization framework.

Model Details

  • Base Model: Wan 2.1 Text-to-Video 14B Diffusers
  • Quantization Method: W4A16 (4-bit weights, 16-bit activations)
  • Group Size: 64 (per-group quantization)
  • Framework: ViDiT-Q
  • Original Size: ~28GB (FP16)
  • Quantized Size: 9.1GB
  • Compression Ratio: 5.9x
  • Quantized Layers: 400

Quantization Details

This model uses real INT4 storage format (not fake quantization):

  • Transformer weights are packed into 4-bit integers
  • 2 weights stored per byte for maximum efficiency
  • Quantization parameters (scales, zero_points) included for reconstruction
  • Non-critical layers (embeddings, norms) remain in FP16/32

Usage

import torch
from your_inference_library import load_quantized_wan21

# Load the quantized model
model = load_quantized_wan21('your-username/wan21-w4a16-int4')

# The model will automatically unpack INT4 weights during inference
# Use same API as original Wan 2.1 model

Performance

  • Memory Usage: ~9.1GB (vs 53.2GB original)
  • Speed: Similar inference speed to FP16 with proper INT4 kernels
  • Quality: Minimal quality degradation with per-group quantization

Technical Details

  • Quantization Scheme: Symmetric per-group INT4 quantization
  • Group Size: 64 weights per quantization group
  • Storage Format: Packed uint8 tensors (2 INT4 values per byte)
  • Reconstruction: On-the-fly unpacking during inference

Files

  • wan21_int4_packed.pth: Main model file with packed INT4 weights
  • config.json: Model configuration and quantization metadata
  • README.md: This model card

Citation

If you use this quantized model, please cite both the original Wan 2.1 paper and the ViDiT-Q quantization framework.

License

Same license as the original Wan 2.1 model. Please check the base model repository for license details.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for samuelt0207/quantize_wan

Finetuned
(2)
this model