Wan 2.1 W4A16 INT4 Quantized Model
This is a W4A16 INT4 quantized version of the Wan-AI/Wan2.1-T2V-14B-Diffusers model, compressed using the ViDiT-Q quantization framework.
Model Details
- Base Model: Wan 2.1 Text-to-Video 14B Diffusers
- Quantization Method: W4A16 (4-bit weights, 16-bit activations)
- Group Size: 64 (per-group quantization)
- Framework: ViDiT-Q
- Original Size: ~28GB (FP16)
- Quantized Size: 9.1GB
- Compression Ratio: 5.9x
- Quantized Layers: 400
Quantization Details
This model uses real INT4 storage format (not fake quantization):
- Transformer weights are packed into 4-bit integers
- 2 weights stored per byte for maximum efficiency
- Quantization parameters (scales, zero_points) included for reconstruction
- Non-critical layers (embeddings, norms) remain in FP16/32
Usage
import torch
from your_inference_library import load_quantized_wan21
# Load the quantized model
model = load_quantized_wan21('your-username/wan21-w4a16-int4')
# The model will automatically unpack INT4 weights during inference
# Use same API as original Wan 2.1 model
Performance
- Memory Usage: ~9.1GB (vs 53.2GB original)
- Speed: Similar inference speed to FP16 with proper INT4 kernels
- Quality: Minimal quality degradation with per-group quantization
Technical Details
- Quantization Scheme: Symmetric per-group INT4 quantization
- Group Size: 64 weights per quantization group
- Storage Format: Packed uint8 tensors (2 INT4 values per byte)
- Reconstruction: On-the-fly unpacking during inference
Files
wan21_int4_packed.pth
: Main model file with packed INT4 weightsconfig.json
: Model configuration and quantization metadataREADME.md
: This model card
Citation
If you use this quantized model, please cite both the original Wan 2.1 paper and the ViDiT-Q quantization framework.
License
Same license as the original Wan 2.1 model. Please check the base model repository for license details.
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for samuelt0207/quantize_wan
Base model
Wan-AI/Wan2.1-T2V-14B-Diffusers