Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
RTFx Calculation / Batch Size
#40
by
lachln
- opened
Was RTFx calculated based on the number of audio samples in the batch? e.g. if 8 samples were transcribed in parallel, each being 10 seconds long (80 seconds of audio total), and transcribing the batch took 4 seconds, is RTFx calculated as 80 / 4 = 20? trying to understand the crazy high RTFx numbers.
Yes, that's how I interpret it.
I recall reading that parakeet-tdt-0.6b-v2 used a batch size of 128 on a Nvidia A10G 24GB GPU to reach 3000+ RTFx.
For reference, I transcribed 400 hours of podcast episodes (averaging 5-15 minutes each) in 24 minutes on my PC using parakeet and two GPUs:
- RTX 4080 16GB: batch size 6, RTFx of 533x
- RTX 3080 Ti 12GB: batch size 4, RTFx of 422x
So, that's an RTFx of 955x combined. So, 3000+ RTFx with better GPUs, shorter audio files, and much larger batch sizes seems reasonable.