license: cc-by-4.0 | |
thumbnail: null | |
tags: | |
- automatic-speech-recognition | |
- speech | |
- audio | |
- Transducer | |
- TDT | |
- FastConformer | |
- Conformer | |
- pytorch | |
- NeMo | |
- hf-asr-leaderboard | |
- coreml | |
- apple | |
language: | |
- en | |
pipeline_tag: automatic-speech-recognition | |
base_model: | |
- nvidia/parakeet-tdt-0.6b-v2 | |
# Parakeet TDT 0.6B V2 - CoreML | |
This is a CoreML-optimized version of NVIDIA's Parakeet TDT 0.6B V2 model, designed for high-performance automatic speech recognition on Apple platforms. | |
## Model Description | |
Models will continue to evolve as we optimize performance and accuracy. This model has been converted to CoreML format for efficient on-device inference on Apple Silicon and iOS devices, enabling real-time speech recognition with | |
minimal memory footprint. | |
## Usage in Swift | |
See the [FluidAudio repository](https://github.com/FluidInference/FluidAudioSwift) for instructions. | |
## Performance | |
- Real-time factor: ~110x on M4 Pro | |
- Memory usage: ~800MB peak | |
- Supported platforms: macOS 14+, iOS 17+ | |
- Optimized for: Apple Silicon | |
## Model Details | |
- Architecture: FastConformer-TDT | |
- Parameters: 0.6B | |
- Sample rate: 16kHz | |
## License | |
This model is released under the CC-BY-4.0 license. See the LICENSE file for details. | |
Acknowledgments | |
Based on NVIDIA's Parakeet TDT model. CoreML conversion and Swift integration by the FluidInference team. |