Streaming?

by pscar - opened May 2

Discussion

pscar

May 2

Thank you NVIDIA team for releasing yet another excellent ASR model!

Is there a guide on how to achieve streaming transcription using the latest parakeet-tdt-0.6b-v2 model?

nithinraok

NVIDIA org May 2

You could do chunked streaming by following this script: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py directions on how to use is inside the script.

We noticed a bug with tdt for chunked streaming inference, we will push it soon to main for everyone to try!

We do also have dedicated cache-aware architecture for streaming use cases: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_fastconformer_hybrid_large_streaming_multi . We are also working on an upgraded performant model to this one.

pscar

May 2

Hi @nithinraok . Thanks for that link. Waiting eagerly for the new streaming models! About the bug - do you recommend waiting for the bugfix if it's major or can the version on main be used already?

OrangeJumper

29 days ago

I second the idea for the live transcription. I would love an alternative to Whisper that had a decent interface that was running on my laptop and could work offline. Press a key, record your voice, let go of the key, transcribes, and pastes into a field.

dyqiang

24 days ago

Has it been fixed yet?
Or is there any update on the progress?

dyqiang

24 days ago

BatchedFrameASRTDT， ImportErro
rError. Could not import.

nithinraok

NVIDIA org 24 days ago

Yes, the fix is now merged to main. Use this script for performing buffered streaming: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py

pscar

24 days ago

Hi @nithinraok , thank you so much for the update! One question out of curiosity: according to the relevant commit, TDT does not currently support greedy_batch decoding strategy, but the .nemo file in this repository defaults to greedy_batch decoding strategy. Is this expected?

nithinraok

NVIDIA org 23 days ago

Yes that's used by default for offline. For streaming its get changed to greedy for now.

peaceli123

18 days ago

Thanks for your update. I saw that your huggingface demo has an interactive interface made with gradio. Can I deploy the streaming model interface on the server myself and use your gradio for non-commercial display?

WJ88

15 days ago

Hi,

I am working on Real-Time Mic version, I have working one ready to test:

https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription
*the whole point of this space is to fit the model into 2vCPUs :) and it works!

The UI may not be nice but in overall just click RECORD, speak and watch transcription. After you finish, refresh the Browser Tab. (to free resources (please)
NOTE: the app is currently public, I mean, each user transcriptions are accumulating and other users cans see them, I am working on isolation but it is what it is, it works :)

You can use NVIDIA-Parakeet-TDT-0.6B-v2 without NVIDIA card in REAL-TIME - I encourage you to check it, check the code (its interesting that the model fits to 2vCPUs)
and finally clone and base on that make your own version! I will stick to optimizations and not fancy features in my repo.

"I love Pain"

saten

12 days ago

I am in the main branch (commit 259d684e73c45091f0b6144342133e6ceb7e824c)
@nithinraok you mentioned that tdt streaming is fixed. Just checking again.
The script speech_to_text_buffered_infer_rnnt calles BatchedFrameASRTDT for tdt from streaming_utils.py with argument stateful_decoding, which I pass true.
But the class BatchedFrameASRTDT in its turn calls BatchedFrameASRRNNT (parent) like this
super().init(asr_model, frame_len=frame_len, total_buffer=total_buffer, batch_size=batch_size)
without passing stateful_decoding, and thus, it remains false is defined in default.

Is that how you intended it to be? Stateful decoding always false?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment