IceClear commited on
Commit
f61c793
·
1 Parent(s): 42f2c22

update readme

Browse files
Files changed (1) hide show
  1. README.md +13 -165
README.md CHANGED
@@ -1,165 +1,13 @@
1
- <div align="center">
2
- <img src="assets/seedvr_logo.png" alt="SeedVR" width="400"/>
3
- </div>
4
-
5
- # SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
6
- > [Jianyi Wang](https://iceclear.github.io), [Zhijie Lin](https://scholar.google.com/citations?user=xXMj6_EAAAAJ&hl=zh-CN), [Meng Wei](https://openreview.net/profile?id=~Meng_Wei11), [Ceyuan Yang](https://scholar.google.com/citations?user=uPmTOHAAAAAJ&hl=zh-CN), [Fei Xiao](https://openreview.net/profile?id=~Fei_xiao8), [Chen Change Loy](https://www.mmlab-ntu.com/person/ccloy/), [Lu Jiang](http://www.lujiang.info/)
7
- >
8
- > **CVPR 2025 (Highlight)**
9
-
10
- <p>
11
- <a href="https://iceclear.github.io/projects/seedvr/">
12
- <img
13
- src="https://img.shields.io/badge/SeedVR-Website-0A66C2?logo=safari&logoColor=white"
14
- alt="SeedVR Website"
15
- />
16
- </a>
17
- <a href="https://huggingface.co/models?other=seedvr">
18
- <img
19
- src="https://img.shields.io/badge/SeedVR-Models-yellow?logo=huggingface&logoColor=yellow"
20
- alt="SeedVR Models"
21
- />
22
- </a>
23
- <a href="https://arxiv.org/abs/2501.01320">
24
- <img
25
- src="https://img.shields.io/badge/SeedVR-Paper-red?logo=arxiv&logoColor=red"
26
- alt="SeedVR Paper on ArXiv"
27
- />
28
- </a>
29
- <a href="https://www.youtube.com/watch?v=aPpBs_B2iCY" target='_blank'>
30
- <img
31
- src="https://img.shields.io/badge/Demo%20Video-%23FF0000.svg?logo=YouTube&logoColor=white"
32
- alt="SeedVR Video Demo on YouTube"
33
- />
34
- </a>
35
- </p>
36
-
37
- >
38
- > **Why SeedVR:** Conventional restoration models achieve inferior performance on both real-world and AIGC video restoration due to limited generation ability. Recent diffusion-based models improve the performance by introducing diffusion prior via ControlNet-like or adaptor-like architectures. Though gaining improvement, these methods generally suffer from constraints brought by the diffusion prior: these models suffer from the same bias as the prior, e.g., limited generation ability on small texts and faces, etc, and only work on fixed resolutions such as 512 or 1024. As a result, most of the existing diffusion-based restoration models rely on patch-based sampling, i.e., dividing the input video into overlapping spatial-temporal patches and fusing these patches using a Gaussian kernel at each diffusion step. The large overlap (e.g., 50\% of the patch size), required for ensuring a coherent output without visible patch boundaries, often leads to considerably slow inference speed. This inefficiency becomes even more pronounced when processing long videos at high resolutions. SeedVR follows SOTA video generation training pipelines to tackle the key challenge in diffusion-based restoration, i.e., by enabling arbitrary-resolution restoration w/o relying on any pretrained diffusion prior and introducing advanced video generation technologies suitable for video restoration. Serving as the largest-ever diffusion transformer model towards generic video restoration, we hope SeedVR could push the frontiers of advanced VR and inspire future research in developing large vision models for real-world video restoration.
39
-
40
-
41
- # SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
42
- > [Jianyi Wang](https://iceclear.github.io), [Shanchuan Lin](https://scholar.google.com/citations?user=EDWUw7gAAAAJ&hl=en), [Zhijie Lin](https://scholar.google.com/citations?user=xXMj6_EAAAAJ&hl=en), [Yuxi Ren](https://scholar.google.com.hk/citations?user=C_6JH-IAAAAJ&hl=en), [Meng Wei](https://openreview.net/profile?id=~Meng_Wei11), [Zongsheng Yue](https://zsyoaoa.github.io/), [Shangchen Zhou](https://shangchenzhou.com/), [Hao Chen](https://haochen-rye.github.io/), [Yang Zhao](https://scholar.google.com/citations?user=uPmTOHAAAAAJ&hl=en), [Ceyuan Yang](https://ceyuan.me/), [Xuefeng Xiao](https://scholar.google.com/citations?user=CVkM9TQAAAAJ&hl=en), [Chen Change Loy](https://www.mmlab-ntu.com/person/ccloy/index.html), [Lu Jiang](http://www.lujiang.info/)
43
-
44
- <p>
45
- <a href="https://iceclear.github.io/projects/seedvr2/">
46
- <img
47
- src="https://img.shields.io/badge/SeedVR2-Website-0A66C2?logo=safari&logoColor=white"
48
- alt="SeedVR Website"
49
- />
50
- </a>
51
- <a href="https://huggingface.co/models?other=seedvr">
52
- <img
53
- src="https://img.shields.io/badge/SeedVR2-Models-yellow?logo=huggingface&logoColor=yellow"
54
- alt="SeedVR2 Models"
55
- />
56
- </a>
57
- <a href="http://arxiv.org/abs/2506.05301">
58
- <img
59
- src="https://img.shields.io/badge/SeedVR2-Paper-red?logo=arxiv&logoColor=red"
60
- alt="SeedVR2 Paper on ArXiv"
61
- />
62
- </a>
63
- <a href="https://www.youtube.com/watch?v=tM8J-WhuAH0" target='_blank'>
64
- <img
65
- src="https://img.shields.io/badge/Demo%20Video-%23FF0000.svg?logo=YouTube&logoColor=white"
66
- alt="SeedVR2 Video Demo on YouTube"
67
- />
68
- </a>
69
- </p>
70
-
71
- >
72
- > Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, due to the limited generation ability and poor temporal consistency, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency observed under high-resolution VR using window attention with a predefined window size. To stabilize and improve the adversarial post-training towards VR, we further verify the effectiveness of a series of losses, including a proposed feature matching loss without significantly sacrificing training efficiency. Extensive experiments show that SeedVR2 can achieve comparable or even better performance compared with existing VR approaches in a single step.
73
-
74
- <p align="center"><img src="assets/teaser.png" width="100%"></p>
75
-
76
-
77
- ## 📢 News
78
-
79
- We sincerely thank all contributors from the open community for their valuable support.
80
-
81
- - **June, 2025:** Repo created.
82
-
83
-
84
- ## 📮 Notice
85
- **Limitations:** These are the prototype models and the performance may not perfectly align with the paper. Our methods are sometimes not robust to heavy degradations and very large motions, and shares some failure cases with existing methods, e.g., fail to fully remove the degradation or simply generate unpleasing details. Moreover, due to the strong generation ability, Our methods tend to overly generate details on inputs with very light degradations, e.g., 720p AIGC videos, leading to oversharpened results occasionally.
86
-
87
-
88
- ## 🔥 Quick Start
89
-
90
- 1️⃣ Set up environment
91
- ```bash
92
- git clone https://github.com/bytedance-seed/SeedVR.git
93
- cd SeedVR
94
- conda create -n seedvr python=3.10 -y
95
- conda activate seedvr
96
- pip install -r requirements.txt
97
- pip install flash_attn==2.5.9.post1 --no-build-isolation
98
- ```
99
-
100
- Install [apex](https://github.com/NVIDIA/apex).
101
- ```bash
102
- # Not sure if this works:
103
- pip install git+https://github.com/andreinechaev/nv-apex.git
104
- # Or
105
- pip install git+https://github.com/huggingface/apex.git
106
- ```
107
-
108
- To use color fix, put the file [color_fix.py](https://github.com/pkuliyi2015/sd-webui-stablesr/blob/master/srmodule/colorfix.py) to `./projects/video_diffusion_sr/color_fix.py`.
109
-
110
-
111
- 2️⃣ Download pretrained checkpoint
112
- ```python
113
-
114
- # Take SeedVR2-3B as an example.
115
- # See all models: https://huggingface.co/models?other=seedvr
116
-
117
- from huggingface_hub import snapshot_download
118
-
119
- save_dir = "ckpts/"
120
- repo_id = "ByteDance-Seed/SeedVR2-3B"
121
- cache_dir = save_dir + "/cache"
122
-
123
- snapshot_download(cache_dir=cache_dir,
124
- local_dir=save_dir,
125
- repo_id=repo_id,
126
- local_dir_use_symlinks=False,
127
- resume_download=True,
128
- allow_patterns=["*.json", "*.safetensors", "*.pth", "*.bin", "*.py", "*.md", "*.txt"],
129
- )
130
-
131
- ```
132
-
133
- ## 🔥 Inference
134
-
135
- You need to set the related settings in the inference files.
136
-
137
- **GPU Requirement:** We adopt sequence parallel to enable multi-GPU inference and 1 H100-80G can handle videos with 100x720x1280. 4 H100-80G further support 1080p and 2K videos. We will support more inference tricks like [Tile-VAE](https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111) and [Progressive Aggregation Sampling](https://github.com/IceClear/StableSR) in the future.
138
-
139
- ```python
140
- # Take 3B SeedVR2 model inference script as an example
141
- torchrun --nproc-per-node=NUM_GPUS projects/inference_seedvr2_3b.py --video_path INPUT_FOLDER --output_dir OUTPUT_FOLDER --seed SEED_NUM --res_h OUTPUT_HEIGHT --res_w OUTPUT_WIDTH --sp_size NUM_SP
142
- ```
143
-
144
-
145
- ## ✍️ Citation
146
-
147
- ```bibtex
148
- @article{wang2025seedvr2,
149
- title={SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training},
150
- author={Wang, Jianyi and Lin, Shanchuan and Lin, Zhijie and Ren, Yuxi and Wei, Meng and Yue, Zongsheng and Zhou, Shangchen and Chen, Hao and Zhao, Yang and Yang, Ceyuan and Xiao, Xuefeng and Loy, Chen Change and Jiang, Lu},
151
- booktitle={arXiv preprint arXiv:2506.05301},
152
- year={2025}
153
- }
154
-
155
- @inproceedings{wang2025seedvr,
156
- title={SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration},
157
- author={Wang, Jianyi and Lin, Zhijie and Wei, Meng and Zhao, Yang and Yang, Ceyuan and Loy, Chen Change and Jiang, Lu},
158
- booktitle={CVPR},
159
- year={2025}
160
- }
161
- ```
162
-
163
-
164
- ## 📜 License
165
- SeedVR and SeedVR2 are licensed under the Apache 2.0.
 
1
+ ---
2
+ title: SeedVR2-3B
3
+ emoji: 🚀
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.29.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ short_description: SeedVR2-3B Video API Demo
12
+ ---
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference