Is it limited to producing a single track?

#21
by BigDeeper - opened

Not clear from the card, if it is possible to produce separate tracks for different speakers, with appropriate silences to allow "others" to speak?

Currently this isn’t supported. All speakers are rendered into a single audio track, rather than separate tracks.

image.png

like that?

Yes, all voices are currently mixed into a single track. If you’d like to separate them, we recommend using post-processing techniques such as VAD and diarization to manually split the generated audio.

image.png
https://github.com/paperwave/VibeVoice/pull/1/files

Single pass activation based voice separation

Currently this isn’t supported. All speakers are rendered into a single audio track, rather than separate tracks.

I noted that Layer 0/1 dimension 609 seems to be the relevant activation.

Sign up or log in to comment