Commit
·
1df7587
1
Parent(s):
ec6f575
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,27 @@
|
|
| 1 |
---
|
| 2 |
license: gpl-3.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: gpl-3.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# DEVA: Tracking Anything with Decoupled Video Segmentation
|
| 6 |
+
|
| 7 |
+

|
| 8 |
+
|
| 9 |
+
[Ho Kei Cheng](https://hkchengrex.github.io/), [Seoung Wug Oh](https://sites.google.com/view/seoungwugoh/), [Brian Price](https://www.brianpricephd.com/), [Alexander Schwing](https://www.alexander-schwing.de/), [Joon-Young Lee](https://joonyoung-cv.github.io/)
|
| 10 |
+
|
| 11 |
+
University of Illinois Urbana-Champaign and Adobe
|
| 12 |
+
|
| 13 |
+
ICCV 2023
|
| 14 |
+
|
| 15 |
+
[[arXiV (coming soon)]]() [[PDF]](https://drive.google.com/file/d/1lAgg-j8d6EH1XYUz9htDaZDh4pxuIslb) [[Project Page]](https://hkchengrex.github.io/Tracking-Anything-with-DEVA/)
|
| 16 |
+
|
| 17 |
+
## Highlights
|
| 18 |
+
1. Provide long-term, open-vocabulary video segmentation with text-prompts out-of-the-box.
|
| 19 |
+
2. Fairly easy to **integrate your own image model**! Wouldn't you or your reviewers be interested in seeing examples where your image model also works well on videos :smirk:? No finetuning is needed!
|
| 20 |
+
|
| 21 |
+
## Abstract
|
| 22 |
+
|
| 23 |
+
We develop a decoupled video segmentation approach (**DEVA**), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation.
|
| 24 |
+
Due to this design, we only need an image-level model for the target task and a universal temporal propagation model which is trained once and generalizes across tasks.
|
| 25 |
+
To effectively combine these two modules, we propose a (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation.
|
| 26 |
+
We show that this decoupled formulation compares favorably to end-to-end approaches in several tasks, most notably in large-vocabulary video panoptic segmentation and open-world video segmentation.
|
| 27 |
+
|