Is it limited to producing a single track?

#21

by BigDeeper - opened Aug 31

Aug 31

Not clear from the card, if it is possible to produce separate tracks for different speakers, with appropriate silences to allow "others" to speak?

YaoyaoChang

Sep 1

Currently this isn’t supported. All speakers are rendered into a single audio track, rather than separate tracks.

PsiPi

Sep 3

•

edited Sep 4

like that?

YaoyaoChang

Sep 3

Yes, all voices are currently mixed into a single track. If you’d like to separate them, we recommend using post-processing techniques such as VAD and diarization to manually split the generated audio.

PsiPi

Sep 4

•

edited Sep 4

https://github.com/paperwave/VibeVoice/pull/1/files

Single pass activation based voice separation

PsiPi

Sep 4

Currently this isn’t supported. All speakers are rendered into a single audio track, rather than separate tracks.

I noted that Layer 0/1 dimension 609 seems to be the relevant activation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment