FlashSR
FlashSR is a 2MB audio super-resolution model based on the HierSpeech++'s upsampler architecture. It upscales 16kHz audio to 48kHz at speeds ranging from 200x to 400x real-time.
Details
- Model Size: 2MB
- Input Rate: 16kHz
- Output Rate: 48kHz
- Inference Speed: 200x - 400x real-time depending on gpu and dtype
Performance Summary
FlashSR is designed for high-speed frequency reconstruction. It offers a significantly lower computational footprint compared to alternatives such as Resemble-Enhance and ClearerVoice, while maintaining similar output quality.
Benchmark Comparison
| Model | Speed | Size |
|---|---|---|
| FlashSR | 200x - 400x realtime | 2MB |
| Resemble-Enhance | < 20x realtime | ~700MB+ |
| ClearerVoice | < 20x realtime | ~200MB+ |
Usage
Usage instructions and source code are available on GitHub: https://github.com/ysharma3501/FlashSR
Credits
Thanks to the authors of HierSpeech++ as this was based on it's 48khz upsampler.