how does the model handle timestamp decoding ?

#50
by StephennFernandes - opened

i wanted to know how does the model handle timestamp decoding, both the word level and segment level timestamps during inference.
are these timestamps trained as part of the ASR training similar to whisper, or is the output transcription and audio going through VAD and MFA to get the accurate timestamped transcriptions.

Sign up or log in to comment