BianYx commited on
Commit
c775866
Β·
verified Β·
1 Parent(s): 9e856aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -31
README.md CHANGED
@@ -12,6 +12,7 @@ tags:
12
  - video editing
13
  ---
14
 
 
15
  # VideoPainter
16
 
17
  This repository contains the implementation of the paper "VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
@@ -24,28 +25,21 @@ Keywords: Video Inpainting, Video Editing, Video Generation
24
 
25
 
26
  <p align="center">
27
- <a href="https://yxbian23.github.io/project/video-painter">🌐Project Page</a> |
28
- <a href="https://arxiv.org/abs/2503.05639">πŸ“œArxiv</a> |
29
- <a href="https://huggingface.co/collections/TencentARC/videopainter-67cc49c6146a48a2ba93d159">πŸ—„οΈData</a> |
30
- <a href="https://youtu.be/HYzNfsD3A0s">πŸ“ΉVideo</a> |
31
- <a href="https://huggingface.co/TencentARC/VideoPainter">πŸ€—Hugging Face Model</a> |
32
  </p>
33
 
 
 
34
 
35
  **πŸ“– Table of Contents**
36
 
37
 
38
  - [VideoPainter](#videopainter)
39
  - [πŸ”₯ Update Log](#-update-log)
40
- - [πŸ“Œ TODO](#todo)
41
  - [πŸ› οΈ Method Overview](#️-method-overview)
42
  - [πŸš€ Getting Started](#-getting-started)
43
- - [Environment Requirement 🌍](#environment-requirement-)
44
- - [Data Download ⬇️](#data-download-️)
45
  - [πŸƒπŸΌ Running Scripts](#-running-scripts)
46
- - [Training 🀯](#training-)
47
- - [Inference πŸ“œ](#inference-)
48
- - [Evaluation πŸ“](#evaluation-)
49
  - [🀝🏼 Cite Us](#-cite-us)
50
  - [πŸ’– Acknowledgement](#-acknowledgement)
51
 
@@ -66,13 +60,14 @@ Keywords: Video Inpainting, Video Editing, Video Generation
66
  ## πŸ› οΈ Method Overview
67
 
68
  We propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6\% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential.
69
- ![](assets/method.jpg)
70
 
71
 
72
 
73
  ## πŸš€ Getting Started
74
 
75
- ### Environment Requirement 🌍
 
76
 
77
 
78
  Clone the repo:
@@ -109,8 +104,10 @@ Optional, you can install sam2 for gradio demo thourgh:
109
  cd ./app
110
  pip install -e .
111
  ```
 
112
 
113
- ### Data Download ⬇️
 
114
 
115
 
116
  **VPBench and VPData**
@@ -186,8 +183,10 @@ cd data_utils
186
  python VPData_download.py
187
  ```
188
 
 
189
 
190
- **Checkpoints**
 
191
 
192
  Checkpoints of VideoPainter can be downloaded from [here](https://huggingface.co/TencentARC/VideoPainter). The ckpt folder contains
193
 
@@ -239,12 +238,12 @@ The ckpt structure should be like:
239
  |-- vae
240
  |-- ...
241
  ```
242
-
243
 
244
  ## πŸƒπŸΌ Running Scripts
245
 
246
-
247
- ### Training 🀯
248
 
249
  You can train the VideoPainter using the script:
250
 
@@ -387,11 +386,11 @@ accelerate launch --config_file accelerate_config_machine_single_ds_wo_cpu.yaml
387
  --p_random_brush 0.3 \
388
  --id_pool_resample_learnable
389
  ```
 
390
 
391
 
392
-
393
-
394
- ### Inference πŸ“œ
395
 
396
  You can inference for the video inpainting or editing with the script:
397
 
@@ -411,7 +410,10 @@ bash edit_bench.sh
411
  ```
412
 
413
  Since VideoPainter is trained on public Internet videos, it primarily performs well on general scenarios. For high-quality industrial applications (e.g., product exhibitions, virtual try-on), we recommend training the model on your domain-specific data. We welcome and appreciate any contributions of trained models from the community!
 
414
 
 
 
415
 
416
  You can also inference through gradio demo:
417
 
@@ -423,9 +425,11 @@ CUDA_VISIBLE_DEVICES=0 python app.py \
423
  --id_adapter ../ckpt/VideoPainterID/checkpoints \
424
  --img_inpainting_model ../ckpt/flux_inp
425
  ```
 
426
 
427
 
428
- ### Evaluation πŸ“
 
429
 
430
  You can evaluate using the script:
431
 
@@ -440,19 +444,16 @@ bash eval_edit.sh
440
  # video editing with ID resampling
441
  bash eval_editing_id_resample.sh
442
  ```
443
-
444
 
445
  ## 🀝🏼 Cite Us
446
 
447
  ```
448
- @misc{bian2025videopainteranylengthvideoinpainting,
449
- title={VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control},
450
- author={Yuxuan Bian and Zhaoyang Zhang and Xuan Ju and Mingdeng Cao and Liangbin Xie and Ying Shan and Qiang Xu},
451
- year={2025},
452
- eprint={2503.05639},
453
- archivePrefix={arXiv},
454
- primaryClass={cs.CV},
455
- url={https://arxiv.org/abs/2503.05639},
456
  }
457
  ```
458
 
 
12
  - video editing
13
  ---
14
 
15
+
16
  # VideoPainter
17
 
18
  This repository contains the implementation of the paper "VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
 
25
 
26
 
27
  <p align="center">
28
+ <a href='https://yxbian23.github.io/project/video-painter'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href="https://arxiv.org/abs/2503.05639"><img src="https://img.shields.io/badge/arXiv-2503.05639-b31b1b.svg"></a> <a href="https://youtu.be/HYzNfsD3A0s"><img src="https://img.shields.io/badge/YouTube-Video-red?logo=youtube"></a> <a href="https://github.com/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github"></a> <a href='https://huggingface.co/datasets/TencentARC/VPData'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue'></a> <a href='https://huggingface.co/datasets/TencentARC/VPBench'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Benchmark-blue'></a> <a href="https://huggingface.co/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a>
 
 
 
 
29
  </p>
30
 
31
+ **Your like and star mean a lot for us to develop this project!** ❀️
32
+
33
 
34
  **πŸ“– Table of Contents**
35
 
36
 
37
  - [VideoPainter](#videopainter)
38
  - [πŸ”₯ Update Log](#-update-log)
39
+ - [TODO](#todo)
40
  - [πŸ› οΈ Method Overview](#️-method-overview)
41
  - [πŸš€ Getting Started](#-getting-started)
 
 
42
  - [πŸƒπŸΌ Running Scripts](#-running-scripts)
 
 
 
43
  - [🀝🏼 Cite Us](#-cite-us)
44
  - [πŸ’– Acknowledgement](#-acknowledgement)
45
 
 
60
  ## πŸ› οΈ Method Overview
61
 
62
  We propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6\% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential.
63
+ ![](assets/teaser.jpg)
64
 
65
 
66
 
67
  ## πŸš€ Getting Started
68
 
69
+ <details>
70
+ <summary><b>Environment Requirement 🌍</b></summary>
71
 
72
 
73
  Clone the repo:
 
104
  cd ./app
105
  pip install -e .
106
  ```
107
+ </details>
108
 
109
+ <details>
110
+ <summary><b>Data Download ⬇️</b></summary>
111
 
112
 
113
  **VPBench and VPData**
 
183
  python VPData_download.py
184
  ```
185
 
186
+ </details>
187
 
188
+ <details>
189
+ <summary><b>Checkpoints</b></summary>
190
 
191
  Checkpoints of VideoPainter can be downloaded from [here](https://huggingface.co/TencentARC/VideoPainter). The ckpt folder contains
192
 
 
238
  |-- vae
239
  |-- ...
240
  ```
241
+ </details>
242
 
243
  ## πŸƒπŸΌ Running Scripts
244
 
245
+ <details>
246
+ <summary><b>Training 🀯</b></summary>
247
 
248
  You can train the VideoPainter using the script:
249
 
 
386
  --p_random_brush 0.3 \
387
  --id_pool_resample_learnable
388
  ```
389
+ </details>
390
 
391
 
392
+ <details>
393
+ <summary><b>Inference πŸ“œ</b></summary>
 
394
 
395
  You can inference for the video inpainting or editing with the script:
396
 
 
410
  ```
411
 
412
  Since VideoPainter is trained on public Internet videos, it primarily performs well on general scenarios. For high-quality industrial applications (e.g., product exhibitions, virtual try-on), we recommend training the model on your domain-specific data. We welcome and appreciate any contributions of trained models from the community!
413
+ </details>
414
 
415
+ <details>
416
+ <summary><b>Gradio Demo πŸ–ŒοΈ</b></summary>
417
 
418
  You can also inference through gradio demo:
419
 
 
425
  --id_adapter ../ckpt/VideoPainterID/checkpoints \
426
  --img_inpainting_model ../ckpt/flux_inp
427
  ```
428
+ </details>
429
 
430
 
431
+ <details>
432
+ <summary><b>Evaluation πŸ“</b></summary>
433
 
434
  You can evaluate using the script:
435
 
 
444
  # video editing with ID resampling
445
  bash eval_editing_id_resample.sh
446
  ```
447
+ </details>
448
 
449
  ## 🀝🏼 Cite Us
450
 
451
  ```
452
+ @article{bian2025videopainter,
453
+ title={VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control},
454
+ author={Bian, Yuxuan and Zhang, Zhaoyang and Ju, Xuan and Cao, Mingdeng and Xie, Liangbin and Shan, Ying and Xu, Qiang},
455
+ journal={arXiv preprint arXiv:2503.05639},
456
+ year={2025}
 
 
 
457
  }
458
  ```
459