--- license: apache-2.0 language: - en pipeline_tag: text-to-video tags: - diffusion - video-to-video - stable-diffusion --- # Live2Diff: **Live** Stream Translation via Uni-directional Attention in Video **Diffusion** Models
**Authors:** [Zhening Xing](https://github.com/LeoXing1996), [Gereon Fox](https://people.mpi-inf.mpg.de/~gfox/), [Yanhong Zeng](https://zengyh1900.github.io/), [Xingang Pan](https://xingangpan.github.io/), [Mohamed Elgharib](https://people.mpi-inf.mpg.de/~elgharib/), [Christian Theobalt](https://people.mpi-inf.mpg.de/~theobalt/), [Kai Chen †](https://chenkai.site/) (†: corresponding author) [![arXiv](https://img.shields.io/badge/arXiv-2407.08701-b31b1b.svg)](https://arxiv.org/abs/2407.08701)[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://live2diff.github.io/)[![Github Repo](https://img.shields.io/badge/Github-Repo-blue?logo=GitHub)](https://live2diff.github.io/) ## Key Features
* **Uni-directional** Temporal Attention with **Warmup** Mechanism * **Multitimestep KV-Cache** for Temporal Attention during Inference * **Depth Prior** for Better Structure Consistency * Compatible with **DreamBooth and LoRA** for Various Styles * **TensorRT** Supported The speed evaluation is conducted on **Ubuntu 20.04.6 LTS** and **Pytorch 2.2.2** with **RTX 4090 GPU** and **Intel(R) Xeon(R) Platinum 8352V CPU**. Denoising steps are set as 2. | Resolution | TensorRT | FPS | | :--------: | :------: | :-------: | | 512 x 512 | **On** | **16.43** | | 512 x 512 | Off | 6.91 | | 768 x 512 | **On** | **12.15** | | 768 x 512 | Off | 6.29 | ## Real-Time Video2Video Demo
Human Face (Web Camera Input) |
Anime Character (Screen Video Input) |