File size: 4,162 Bytes
2a36f77 02a7496 2a36f77 ba1bfa1 2a36f77 0915f2f 2a36f77 0915f2f 2a36f77 0915f2f a0cd34f 0915f2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
license: cc-by-4.0
track_downloads: true
language:
- en
- es
- fr
- de
- bg
- hr
- cs
- da
- nl
- et
- fi
- el
- hu
- it
- lv
- lt
- mt
- pl
- pt
- ro
- sk
- sl
- sv
- ru
- uk
pipeline_tag: automatic-speech-recognition
library_name: nemo
datasets:
- nvidia/Granary
- nemo/asr-set-3.0
thumbnail: null
tags:
- automatic-speech-recognition
- speech
- audio
- Transducer
- TDT
- FastConformer
- Conformer
- pytorch
- NeMo
- hf-asr-leaderboard
widget:
- example_title: Librispeech sample 1
src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
base_model:
- nvidia/parakeet-tdt-0.6b-v3
---
# **<span style="color:#5DAF8D"> 🧃 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model CoreML </span>**
<style>
img {
display: inline;
}
</style>
[](#model-architecture)
| [](#model-architecture)
| [](#datasets)
| [](https://discord.gg/WNsvaCtmDe)
| [](https://github.com/FluidInference/FluidAudio)
On‑device multilingual ASR model converted to Core ML for Apple platforms. This model powers FluidAudio’s batch ASR and is the same model used in our backend. It supports 25 European languages and is optimized for low‑latency, private, offline transcription.
For conversion script and benchmarks:
https://github.com/FluidInference/mobius/tree/main/models/tts/parakeet-tdt-v3-0.6b/coreml
## Highlights
- **Core ML**: Runs fully on‑device (ANE/CPU) on Apple Silicon.
- **Multilingual**: 25 European languages; see model usage in FluidAudio for examples.
- **Performance**: ~110× RTF on M4 Pro for batch ASR (1 min audio ≈ 0.5 s).
- **Privacy**: No network calls required once models are downloaded.
## Intended Use
- **Batch transcription** of complete audio files on macOS/iOS.
- **Local dictation** and note‑taking apps where privacy and latency matter.
- **Embedded ASR** in production apps via the FluidAudio Swift framework.
## Supported Platforms
- macOS 14+ (Apple Silicon recommended)
- iOS 17+
## Model Details
- **Architecture**: Parakeet TDT v3 (Token Duration Transducer, 0.6B parameters)
- **Input audio**: 16 kHz, mono, Float32 PCM in range [-1, 1]
- **Languages**: 25 European languages (multilingual)
- **Precision**: Mixed precision optimized for Core ML execution (ANE/CPU)
## Performance
- **Real‑time factor (RTF)**: ~110× on M4 Pro in batch mode
- Throughput and latency vary with device, input duration, and compute units (ANE/CPU).
## Usage
For quickest integration, use the FluidAudio Swift framework which handles model loading, audio preprocessing, and decoding.
### Swift (FluidAudio)
```swift
import AVFoundation
import FluidAudio
Task {
// Download and load ASR models (first run only)
let models = try await AsrModels.downloadAndLoad()
// Initialize ASR manager with default config
let asr = AsrManager(config: .default)
try await asr.initialize(models: models)
// Load audio and transcribe
let samples = try await AudioProcessor.loadAudioFile(path: "path/to/audio.wav")
let result = try await asr.transcribe(samples, source: .system)
print(result.text)
asr.cleanup()
}
```
For more examples (including CLI usage and benchmarking), see the FluidAudio repository: https://github.com/FluidInference/FluidAudio
## Files
- Core ML model artifacts suitable for use via the FluidAudio APIs (preferred) or directly with Core ML.
- Tokenizer and configuration assets are included/managed by FluidAudio’s loaders.
## Limitations
- Primary coverage is European languages; performance may degrade for non‑European languages.
## License
Apache 2.0. See the FluidAudio repository for details and usage guidance. |