Error above ~1 minute ASR in hungarian sample.
#12
by
robert1968
- opened
Hi,
- When i try to load 21 minute mp3 file, it failed with CUDA out of memory on a RTX3060 (24GB GPU)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 15.83 GiB. GPU 0 has a total capacity of 23.59 GiB of which 1.48 GiB is free. Process 20329 has 0 bytes memory in use. Of the allocated memory 21.43 GiB is allocated by PyTorch, and 19.50 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
- When try 10 minutes sample wav, it start to transcript but after 1 minute it repeats words, and the output is useless.
Debug: Long-form timestamps not requested
๐ Preprocessing audio...
๐ Transcribing audio (long-form)...
Debug: Long-form calling transcribe without timestamps
[NeMo W 2025-09-09 17:41:34 dataloader:732] The following configuration keys are ignored by Lhotse dataloader: trim_silence
[NeMo W 2025-09-09 17:41:34 dataloader:479] You are using a non-tarred dataset and requested tokenization during data sampling (pretokenize=True). This will cause the tokenization to happen in the main (GPU) process,possibly impacting the training speed if your tokenizer is very large.If the impact is noticable, set pretokenize=False in dataloader config.(note: that will disable token-per-second filtering and 2D bucketing features)
Transcribing: 1it [00:14, 14.69s/it]
Debug: Long-form result type: <class 'nemo.collections.asr.parts.utils.rnnt_utils.Hypothesis'>
Debug: Long-form result: Hypothesis(score=0.0, y_sequence=tensor([ 2083, 16100, 1338, ..., 16067, 1245, 16132]), text=' MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI, MI', dec_out=None, dec_state=None, timestamp={'word': [], 'segment': [], 'char': []}, alignments=None, frame_confidence=None, token_confidence=None, word_confidence=None, length=0, y=None, lm_state=None, lm_scores=None, ngram_lm_state=None, tokens=None, last_token=None, token_duration=None, last_frame=None)
Debug: Long-form timestamps not requested
use main branch.