Can I do direct speech-to-speech

#27
by srivatsan32 - opened

Right now, when I try the following code:

from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(device="cuda")

# If you want to synthesize with a different voice, specify the audio prompt
AUDIO_PROMPT_PATH="YOUR_FILE.wav"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)

In
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH),
I still need to feed a text as input. Is there a way to directly clone an audio, i.e. audio as input and an audio (with the same tone, accent and pauses) as output?

You can in Comfy UI with the workflow Resemble provided a split voice to voice and TTS workflow . Upload the target voice, and then upload the performance, you get the performance with the new voice. :) If you are talking about real time then, there will be a delayed output.

Sign up or log in to comment