OutOfMemoryError: CUDA out of memory. on RTX A5000
Hi,
Thanks to the team for the contributing this impressive model.
I'm trying to transcribe an audio (native English speech) that spans ~40 mins. The model is not able to perform the transcription and fails with OutOfMemoryError: CUDA out of memory
ERROR on a RTX A5000 GPU (with VRAM: 24 GB
). Other hardware details: RAM: 128 GB, CPUs: 32
.
I tried looking up if there are any preprocessing parameters that enable chunking of the audio into smaller segments (similar to OpenAI's Whisper), but couldn't find any.
Here's the code I used:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
output = asr_model.transcribe('input.wav', timestamps=True)
# NOTE: using timestamps=False did not resolve OOM ERROR :(
Is this expected? I'd be grateful for any ideas or suggestions. Thanks!
Hi you could do two things:
- apply limited attention settings with:
asr_model.change_attention_model("rel_pos_local_attn", [256,256])
asr_model.change_subsampling_conv_chunking_factor(1)
asr_model.to(torch.bfloat16)
output = asr_model.transcribe('input.wav', timestamps=True)
This will enable long form inference and length of it will depend on GPU available RAM. This comes at a little degradation in accuracy but not much.
- Chunk inference using this script: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/rnnt/speech_to_text_buffered_infer_rnnt.py . Its not integrated to .transcribe() yet.