InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
We propose InfiniteTalk, a novel sparse-frame video dubbing framework. Given an input video and audio track, InfiniteTalk synthesizes a new video with accurate lip synchronization while simultaneously aligning head movements, body posture, and facial expressions with the audio. Unlike traditional dubbing methods that focus solely on lips, InfiniteTalk enables infinite-length video generation with accurate lip synchronization and consistent identity preservation. Beside, InfiniteTalk can also be used as an image-audio-to-video model with an image and an audio as input.
- 💬 Sparse-frame Video Dubbing – Synchronizes not only lips, but aslo head, body, and expressions
- ⏱️ Infinite-Length Generation – Supports unlimited video duration
- ✨ Stability – Reduces hand/body distortions compared to MultiTalk
- 🚀 Lip Accuracy – Achieves superior lip synchronization to MultiTalk
This repository hosts the model weights for InfiniteTalk. For installation, usage instructions, and further documentation, please visit our GitHub repository.
License Agreement
The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations.