Replace characters in videos using reference images
Generate realistic conversational videos from audio inputs