FastVideo
/

Wan2.1-VSA-T2V-14B-720P-Diffusers

Model card Files Files and versions

PY007 commited on Jul 30

Commit

3646c57

·

verified ·

1 Parent(s): 9664edc

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ python setup_vsa.py install
 num_gpus=1
 export FASTVIDEO_ATTENTION_BACKEND=VIDEO_SPARSE_ATTN
 # change model path to local dir if you want to inference using your checkpoint
-export MODEL_BASE=Wan-AI/Wan2.1-T2V-1.3B-Diffusers
 # export MODEL_BASE=hunyuanvideo-community/HunyuanVideo
 fastvideo generate \
     --model-path $MODEL_BASE \
@@ -55,7 +55,7 @@ fastvideo generate \
     --prompt "A beautiful woman in a red dress walking down a street" \
     --negative-prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
     --seed 1024 \
-    --output-path outputs_video_1.3B_VSA/sparsity_0.9/
 ```
 - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**
 - We use [FastVideo 720P Synthetic Wan dataset](https://huggingface.co/datasets/FastVideo/Wan-Syn_77x768x1280_250k) for training.

 num_gpus=1
 export FASTVIDEO_ATTENTION_BACKEND=VIDEO_SPARSE_ATTN
 # change model path to local dir if you want to inference using your checkpoint
+export MODEL_BASE=FastVideo/Wan2.1-VSA-T2V-14B-720P-Diffusers
 # export MODEL_BASE=hunyuanvideo-community/HunyuanVideo
 fastvideo generate \
     --model-path $MODEL_BASE \
     --prompt "A beautiful woman in a red dress walking down a street" \
     --negative-prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
     --seed 1024 \
+    --output-path VSA-DMD/sparsity_0.9/
 ```
 - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**
 - We use [FastVideo 720P Synthetic Wan dataset](https://huggingface.co/datasets/FastVideo/Wan-Syn_77x768x1280_250k) for training.