Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,7 @@ tags:
|
|
12 |
- video editing
|
13 |
---
|
14 |
|
|
|
15 |
# VideoPainter
|
16 |
|
17 |
This repository contains the implementation of the paper "VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
|
@@ -24,28 +25,21 @@ Keywords: Video Inpainting, Video Editing, Video Generation
|
|
24 |
|
25 |
|
26 |
<p align="center">
|
27 |
-
|
28 |
-
<a href="https://arxiv.org/abs/2503.05639">πArxiv</a> |
|
29 |
-
<a href="https://huggingface.co/collections/TencentARC/videopainter-67cc49c6146a48a2ba93d159">ποΈData</a> |
|
30 |
-
<a href="https://youtu.be/HYzNfsD3A0s">πΉVideo</a> |
|
31 |
-
<a href="https://huggingface.co/TencentARC/VideoPainter">π€Hugging Face Model</a> |
|
32 |
</p>
|
33 |
|
|
|
|
|
34 |
|
35 |
**π Table of Contents**
|
36 |
|
37 |
|
38 |
- [VideoPainter](#videopainter)
|
39 |
- [π₯ Update Log](#-update-log)
|
40 |
-
- [
|
41 |
- [π οΈ Method Overview](#οΈ-method-overview)
|
42 |
- [π Getting Started](#-getting-started)
|
43 |
-
- [Environment Requirement π](#environment-requirement-)
|
44 |
-
- [Data Download β¬οΈ](#data-download-οΈ)
|
45 |
- [ππΌ Running Scripts](#-running-scripts)
|
46 |
-
- [Training π€―](#training-)
|
47 |
-
- [Inference π](#inference-)
|
48 |
-
- [Evaluation π](#evaluation-)
|
49 |
- [π€πΌ Cite Us](#-cite-us)
|
50 |
- [π Acknowledgement](#-acknowledgement)
|
51 |
|
@@ -66,13 +60,14 @@ Keywords: Video Inpainting, Video Editing, Video Generation
|
|
66 |
## π οΈ Method Overview
|
67 |
|
68 |
We propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6\% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential.
|
69 |
-
. The ckpt folder contains
|
193 |
|
@@ -239,12 +238,12 @@ The ckpt structure should be like:
|
|
239 |
|-- vae
|
240 |
|-- ...
|
241 |
```
|
242 |
-
|
243 |
|
244 |
## ππΌ Running Scripts
|
245 |
|
246 |
-
|
247 |
-
|
248 |
|
249 |
You can train the VideoPainter using the script:
|
250 |
|
@@ -387,11 +386,11 @@ accelerate launch --config_file accelerate_config_machine_single_ds_wo_cpu.yaml
|
|
387 |
--p_random_brush 0.3 \
|
388 |
--id_pool_resample_learnable
|
389 |
```
|
|
|
390 |
|
391 |
|
392 |
-
|
393 |
-
|
394 |
-
### Inference π
|
395 |
|
396 |
You can inference for the video inpainting or editing with the script:
|
397 |
|
@@ -411,7 +410,10 @@ bash edit_bench.sh
|
|
411 |
```
|
412 |
|
413 |
Since VideoPainter is trained on public Internet videos, it primarily performs well on general scenarios. For high-quality industrial applications (e.g., product exhibitions, virtual try-on), we recommend training the model on your domain-specific data. We welcome and appreciate any contributions of trained models from the community!
|
|
|
414 |
|
|
|
|
|
415 |
|
416 |
You can also inference through gradio demo:
|
417 |
|
@@ -423,9 +425,11 @@ CUDA_VISIBLE_DEVICES=0 python app.py \
|
|
423 |
--id_adapter ../ckpt/VideoPainterID/checkpoints \
|
424 |
--img_inpainting_model ../ckpt/flux_inp
|
425 |
```
|
|
|
426 |
|
427 |
|
428 |
-
|
|
|
429 |
|
430 |
You can evaluate using the script:
|
431 |
|
@@ -440,19 +444,16 @@ bash eval_edit.sh
|
|
440 |
# video editing with ID resampling
|
441 |
bash eval_editing_id_resample.sh
|
442 |
```
|
443 |
-
|
444 |
|
445 |
## π€πΌ Cite Us
|
446 |
|
447 |
```
|
448 |
-
@
|
449 |
-
|
450 |
-
|
451 |
-
|
452 |
-
|
453 |
-
archivePrefix={arXiv},
|
454 |
-
primaryClass={cs.CV},
|
455 |
-
url={https://arxiv.org/abs/2503.05639},
|
456 |
}
|
457 |
```
|
458 |
|
|
|
12 |
- video editing
|
13 |
---
|
14 |
|
15 |
+
|
16 |
# VideoPainter
|
17 |
|
18 |
This repository contains the implementation of the paper "VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
|
|
|
25 |
|
26 |
|
27 |
<p align="center">
|
28 |
+
<a href='https://yxbian23.github.io/project/video-painter'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href="https://arxiv.org/abs/2503.05639"><img src="https://img.shields.io/badge/arXiv-2503.05639-b31b1b.svg"></a> <a href="https://youtu.be/HYzNfsD3A0s"><img src="https://img.shields.io/badge/YouTube-Video-red?logo=youtube"></a> <a href="https://github.com/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github"></a> <a href='https://huggingface.co/datasets/TencentARC/VPData'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue'></a> <a href='https://huggingface.co/datasets/TencentARC/VPBench'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Benchmark-blue'></a> <a href="https://huggingface.co/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a>
|
|
|
|
|
|
|
|
|
29 |
</p>
|
30 |
|
31 |
+
**Your like and star mean a lot for us to develop this project!** β€οΈ
|
32 |
+
|
33 |
|
34 |
**π Table of Contents**
|
35 |
|
36 |
|
37 |
- [VideoPainter](#videopainter)
|
38 |
- [π₯ Update Log](#-update-log)
|
39 |
+
- [TODO](#todo)
|
40 |
- [π οΈ Method Overview](#οΈ-method-overview)
|
41 |
- [π Getting Started](#-getting-started)
|
|
|
|
|
42 |
- [ππΌ Running Scripts](#-running-scripts)
|
|
|
|
|
|
|
43 |
- [π€πΌ Cite Us](#-cite-us)
|
44 |
- [π Acknowledgement](#-acknowledgement)
|
45 |
|
|
|
60 |
## π οΈ Method Overview
|
61 |
|
62 |
We propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6\% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential.
|
63 |
+

|
64 |
|
65 |
|
66 |
|
67 |
## π Getting Started
|
68 |
|
69 |
+
<details>
|
70 |
+
<summary><b>Environment Requirement π</b></summary>
|
71 |
|
72 |
|
73 |
Clone the repo:
|
|
|
104 |
cd ./app
|
105 |
pip install -e .
|
106 |
```
|
107 |
+
</details>
|
108 |
|
109 |
+
<details>
|
110 |
+
<summary><b>Data Download β¬οΈ</b></summary>
|
111 |
|
112 |
|
113 |
**VPBench and VPData**
|
|
|
183 |
python VPData_download.py
|
184 |
```
|
185 |
|
186 |
+
</details>
|
187 |
|
188 |
+
<details>
|
189 |
+
<summary><b>Checkpoints</b></summary>
|
190 |
|
191 |
Checkpoints of VideoPainter can be downloaded from [here](https://huggingface.co/TencentARC/VideoPainter). The ckpt folder contains
|
192 |
|
|
|
238 |
|-- vae
|
239 |
|-- ...
|
240 |
```
|
241 |
+
</details>
|
242 |
|
243 |
## ππΌ Running Scripts
|
244 |
|
245 |
+
<details>
|
246 |
+
<summary><b>Training π€―</b></summary>
|
247 |
|
248 |
You can train the VideoPainter using the script:
|
249 |
|
|
|
386 |
--p_random_brush 0.3 \
|
387 |
--id_pool_resample_learnable
|
388 |
```
|
389 |
+
</details>
|
390 |
|
391 |
|
392 |
+
<details>
|
393 |
+
<summary><b>Inference π</b></summary>
|
|
|
394 |
|
395 |
You can inference for the video inpainting or editing with the script:
|
396 |
|
|
|
410 |
```
|
411 |
|
412 |
Since VideoPainter is trained on public Internet videos, it primarily performs well on general scenarios. For high-quality industrial applications (e.g., product exhibitions, virtual try-on), we recommend training the model on your domain-specific data. We welcome and appreciate any contributions of trained models from the community!
|
413 |
+
</details>
|
414 |
|
415 |
+
<details>
|
416 |
+
<summary><b>Gradio Demo ποΈ</b></summary>
|
417 |
|
418 |
You can also inference through gradio demo:
|
419 |
|
|
|
425 |
--id_adapter ../ckpt/VideoPainterID/checkpoints \
|
426 |
--img_inpainting_model ../ckpt/flux_inp
|
427 |
```
|
428 |
+
</details>
|
429 |
|
430 |
|
431 |
+
<details>
|
432 |
+
<summary><b>Evaluation π</b></summary>
|
433 |
|
434 |
You can evaluate using the script:
|
435 |
|
|
|
444 |
# video editing with ID resampling
|
445 |
bash eval_editing_id_resample.sh
|
446 |
```
|
447 |
+
</details>
|
448 |
|
449 |
## π€πΌ Cite Us
|
450 |
|
451 |
```
|
452 |
+
@article{bian2025videopainter,
|
453 |
+
title={VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control},
|
454 |
+
author={Bian, Yuxuan and Zhang, Zhaoyang and Ju, Xuan and Cao, Mingdeng and Xie, Liangbin and Shan, Ying and Xu, Qiang},
|
455 |
+
journal={arXiv preprint arXiv:2503.05639},
|
456 |
+
year={2025}
|
|
|
|
|
|
|
457 |
}
|
458 |
```
|
459 |
|