ResembleAI/chatterbox · Can I do direct speech-to-speech

Right now, when I try the following code:

from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(device="cuda")

# If you want to synthesize with a different voice, specify the audio prompt
AUDIO_PROMPT_PATH="YOUR_FILE.wav"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)

In
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH),
I still need to feed a text as input. Is there a way to directly clone an audio, i.e. audio as input and an audio (with the same tone, accent and pauses) as output?