nvidia/parakeet-tdt-0.6b-v2 · OutOfMemoryError: CUDA out of memory. on RTX A5000

Hi,

Thanks to the team for the contributing this impressive model.

I'm trying to transcribe an audio (native English speech) that spans ~40 mins. The model is not able to perform the transcription and fails with OutOfMemoryError: CUDA out of memory ERROR on a RTX A5000 GPU (with VRAM: 24 GB). Other hardware details: RAM: 128 GB, CPUs: 32.
I tried looking up if there are any preprocessing parameters that enable chunking of the audio into smaller segments (similar to OpenAI's Whisper), but couldn't find any.

Here's the code I used:

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
output = asr_model.transcribe('input.wav', timestamps=True) 
# NOTE: using timestamps=False did not resolve OOM ERROR :(

Is this expected? I'd be grateful for any ideas or suggestions. Thanks!

asr_model.change_attention_model("rel_pos_local_attn", [256,256]) asr_model.change_subsampling_conv_chunking_factor(1) asr_model.to(torch.bfloat16) output = asr_model.transcribe('input.wav', timestamps=True)