--- license: mit tags: - audio - voice-activity-detection - coreml - silero - speech - ios - macos - swift library_name: coreml pipeline_tag: voice-activity-detection datasets: - alexwengg/musan_mini50 - alexwengg/musan_mini100 metrics: - accuracy - f1 language: - en base_model: - onnx-community/silero-vad --- # **🧃 CoreML Silero VAD ** [![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe) [![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/FluidAudio?style=flat&logo=github)](https://github.com/FluidInference/FluidAudio) A CoreML implementation of the Silero Voice Activity Detection (VAD) model, optimized for Apple platforms (iOS/macOS). This repository contains pre-converted CoreML models ready for use in Swift applications. ## Model Description **Developed by:** Silero Team (original), converted by FluidAudio **Model type:** Voice Activity Detection **License:** MIT **Parent Model:** [silero-vad](https://github.com/snakers4/silero-vad) ### Model Details - **Architecture:** STFT + Encoder + RNN Decoder pipeline - **Input:** 16kHz mono audio chunks (512 samples / 32ms) - **Output:** Voice activity probability (0.0-1.0) - **Memory:** ~2MB total model size ## Intended Use ### Primary Use Cases - Real-time voice activity detection in iOS/macOS applications - Speech preprocessing for ASR systems - Audio segmentation and filtering ## How to Use ### Swift Integration ```swift import FluidAudio let config = VADConfig( threshold: 0.3, chunkSize: 512, // 512 being the most optimal sampleRate: 16000 ) let vadManager = VADManager(config: config) try await vadManager.initialize() // Process audio chunk let result = try await vadManager.processChunk(audioChunk) print("Voice probability: \(result.probability)") print("Is voice active: \(result.isVoiceActive)") ``` Installation Add FluidAudio to your Swift project: dependencies: [ .package(url: "https://github.com/FluidAudio/FluidAudioSwift.git", from: "1.0.0") ] Performance Benchmarks on Apple Silicon (M1/M2) | Metric | Value | |------------------|---------------------| | Latency | <2ms per 32ms chunk | | Real-time Factor | 0.02x | | Memory Usage | ~15MB | | CPU Usage | <5% (single core) | Accuracy Metrics Evaluated on common speech datasets: - Precision: 94.2% - Recall: 92.8% - F1-Score: 93.5% Model Files This repository contains three CoreML models that work together: - silero_stft.mlmodel (650KB) - STFT feature extraction - silero_encoder.mlmodel (254KB) - Feature encoding - silero_rnn_decoder.mlmodel (527KB) - RNN-based classification Training Data The original Silero VAD model was trained on a diverse dataset including: - Clean speech audio - Noisy speech with various background conditions - Music and non-speech audio for negative samples Limitations and Bias Known Limitations - Optimized for 16kHz sample rate (other rates may reduce accuracy) - May struggle with very quiet speech (<-30dB SNR) - Performance varies with microphone quality and recording conditions Technical Details Model Architecture Audio Input (512 samples, 16kHz) ↓ STFT Model (spectral features) ↓ Encoder Model (feature compression) ↓ RNN Decoder (temporal modeling) ↓ Voice Probability Output Citation @misc{silero-vad-coreml, title={CoreML Silero VAD}, author={FluidAudio Team}, year={2024}, url={https://huggingface.co/alexwengg/coreml-silero-vad} } @misc{silero-vad, title={Silero VAD}, author={Silero Team}, year={2021}, url={https://github.com/snakers4/silero-vad} } Related Models Check out other CoreML audio models in the https://huggingface.co/collections/bweng/coreml-685b12fd2 51f80552c08e2b9: - https://huggingface.co/alexwengg/coreml_speaker_diariza tion - Identify "who spoke when" - https://huggingface.co/collections/bweng/coreml-685b12f d251f80552c08e2b9 - Speech-to-text for Apple platforms Repository and Support - GitHub: https://github.com/FluidAudio/FluidAudioSwift - Documentation: https://github.com/FluidAudio/FluidAudioSwift/wiki - Issues: https://github.com/FluidAudio/FluidAudioSwift/issues - Community: https://github.com/FluidAudio/FluidAudioSwift/discussions License This project is licensed under the MIT License - see the LICENSE file for details. The original Silero VAD model is also under MIT license. See https://github.com/snakers4/silero-vad/blob/master/LI CENSE for details.