FluidInference
/

speaker-diarization-coreml

@@ -20,45 +20,6 @@ State-of-the-art speaker diarization models optimized for Apple Neural Engine, p
 This repository contains CoreML-optimized speaker diarization models specifically converted and optimized for Apple devices (macOS 13.0+, iOS 16.0+). These models enable efficient on-device speaker diarization with minimal power consumption while maintaining state-of-the-art accuracy.
-### Key Features
-- **Apple Neural Engine Optimized**: Zero performance trade-offs with maximum efficiency
-- **Real-time Processing**: RTF of 0.02x (50x faster than real-time)
-- **Research-Competitive**: DER of 17.7% on AMI benchmark
-- **Power Efficient**: Designed for maximum performance per watt
-- **Privacy-First**: All processing happens on-device
-## Intended Uses & Limitations
-### Intended Uses
-- **Meeting Transcription**: Real-time speaker identification in meetings
-- **Voice Assistants**: Multi-speaker conversation understanding
-- **Media Production**: Automated speaker labeling for podcasts/interviews
-- **Research**: Academic research in speaker diarization
-- **Privacy-Focused Applications**: On-device processing without cloud dependencies
-### Limitations
-- Optimized for 16kHz audio input
-- Best performance with clear audio (no heavy background noise)
-- May struggle with heavily overlapping speech
-- Requires Apple devices with CoreML support
-### Technical Specifications
-- **Input**: 16kHz mono audio
-- **Output**: Speaker segments with timestamps and IDs
-- **Framework**: CoreML (converted from PyTorch)
-- **Optimization**: Apple Neural Engine (ANE) optimized operations
-- **Precision**: FP32
-## Training Data
-These models are converted from open-source variants trained on diverse speaker diarization datasets. The original models were trained on:
-- Multi-speaker conversation datasets
-- Various acoustic conditions
-- Multiple languages and accents
-*Note: Specific training data details depend on the original open-source model variant.*
 ## Usage
 See the SDK for more details [https://github.com/FluidInference/FluidAudio](https://github.com/FluidInference/FluidAudio)
@@ -116,3 +77,41 @@ pyannote-audio - State-of-the-art diarization research
 wespeaker - Speaker embedding techniques

 This repository contains CoreML-optimized speaker diarization models specifically converted and optimized for Apple devices (macOS 13.0+, iOS 16.0+). These models enable efficient on-device speaker diarization with minimal power consumption while maintaining state-of-the-art accuracy.
 ## Usage
 See the SDK for more details [https://github.com/FluidInference/FluidAudio](https://github.com/FluidInference/FluidAudio)
 wespeaker - Speaker embedding techniques
+### Key Features
+- **Apple Neural Engine Optimized**: Zero performance trade-offs with maximum efficiency
+- **Real-time Processing**: RTF of 0.02x (50x faster than real-time)
+- **Research-Competitive**: DER of 17.7% on AMI benchmark
+- **Power Efficient**: Designed for maximum performance per watt
+- **Privacy-First**: All processing happens on-device
+## Intended Uses & Limitations
+### Intended Uses
+- **Meeting Transcription**: Real-time speaker identification in meetings
+- **Voice Assistants**: Multi-speaker conversation understanding
+- **Media Production**: Automated speaker labeling for podcasts/interviews
+- **Research**: Academic research in speaker diarization
+- **Privacy-Focused Applications**: On-device processing without cloud dependencies
+### Limitations
+- Optimized for 16kHz audio input
+- Best performance with clear audio (no heavy background noise)
+- May struggle with heavily overlapping speech
+- Requires Apple devices with CoreML support
+### Technical Specifications
+- **Input**: 16kHz mono audio
+- **Output**: Speaker segments with timestamps and IDs
+- **Framework**: CoreML (converted from PyTorch)
+- **Optimization**: Apple Neural Engine (ANE) optimized operations
+- **Precision**: FP32
+## Training Data
+These models are converted from open-source variants trained on diverse speaker diarization datasets. The original models were trained on:
+- Multi-speaker conversation datasets
+- Various acoustic conditions
+- Multiple languages and accents
+*Note: Specific training data details depend on the original open-source model variant.*