speechbrain
/

asr-streaming-conformer-librispeech

Automatic Speech Recognition

Model card Files Files and versions Community

sdelangen commited on Feb 26, 2024

Commit

b40c540

·

verified ·

1 Parent(s): aef3c9c

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -171,9 +171,12 @@ for text_chunk in asr.transcribe_file_streaming(args.audio_path, config):
 We want to optimize some things around the model before we create a proper HuggingFace space demonstrating live streaming on CPU.
-In the mean time, this is a simple hacky demo of live ASR in the browser using Gradio's live microphone streaming feature.
-If you run this, please note that browsers may refuse to stream audio from an insecure connection, unless it is localhost.
-If you are running this on a remote server, you could use SSH port forwarding to expose the remote's port on your machine.
 Run using:
@@ -236,7 +239,7 @@ def transcribe(stream, new_chunk):
     # HACK: we are making poor use of the resampler across chunk boundaries
     # which may degrade accuracy.
     # NOTE: we should also absolutely avoid recreating a resampler every time
-    resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=asr.audio_normalizer.sample_rate)
     y = resampler(y)  # janky resample (probably to 16kHz)

 We want to optimize some things around the model before we create a proper HuggingFace space demonstrating live streaming on CPU.
+In the mean time, this is a simple hacky demo of live ASR in the browser using Gradio's live microphone streaming feature.
+If you run this, please note:
+- Modern browsers refuse to stream microphone input over an untrusted connection (plain HTTP), unless it is localhost. If you are running this on a remote server, you could use SSH port forwarding to expose the remote's port on your machine.
+- Streaming using Gradio on Firefox seems to cause some issues. Chromium-based browsers seem to behave better.
 Run using:
     # HACK: we are making poor use of the resampler across chunk boundaries
     # which may degrade accuracy.
     # NOTE: we should also absolutely avoid recreating a resampler every time
+    resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=asr.audio_normalizer.sample_rate).to(device)
     y = resampler(y)  # janky resample (probably to 16kHz)