Ahmadzei's picture
update 1
57bdca5
raw
history blame
360 Bytes
The hidden unit is mapped to an embedding to make a prediction.
Encoder-decoder[[audio-encoder-decoder]]
Speech2Text is a speech model designed for automatic speech recognition (ASR) and speech translation. The model accepts log mel-filter bank features extracted from the audio waveform and pretrained autoregressively to generate a transcript or translation.