bweng commited on
Commit
7443ce9
·
verified ·
1 Parent(s): f83a630

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -39
README.md CHANGED
@@ -20,45 +20,6 @@ State-of-the-art speaker diarization models optimized for Apple Neural Engine, p
20
 
21
  This repository contains CoreML-optimized speaker diarization models specifically converted and optimized for Apple devices (macOS 13.0+, iOS 16.0+). These models enable efficient on-device speaker diarization with minimal power consumption while maintaining state-of-the-art accuracy.
22
 
23
- ### Key Features
24
- - **Apple Neural Engine Optimized**: Zero performance trade-offs with maximum efficiency
25
- - **Real-time Processing**: RTF of 0.02x (50x faster than real-time)
26
- - **Research-Competitive**: DER of 17.7% on AMI benchmark
27
- - **Power Efficient**: Designed for maximum performance per watt
28
- - **Privacy-First**: All processing happens on-device
29
-
30
-
31
- ## Intended Uses & Limitations
32
-
33
- ### Intended Uses
34
- - **Meeting Transcription**: Real-time speaker identification in meetings
35
- - **Voice Assistants**: Multi-speaker conversation understanding
36
- - **Media Production**: Automated speaker labeling for podcasts/interviews
37
- - **Research**: Academic research in speaker diarization
38
- - **Privacy-Focused Applications**: On-device processing without cloud dependencies
39
-
40
- ### Limitations
41
- - Optimized for 16kHz audio input
42
- - Best performance with clear audio (no heavy background noise)
43
- - May struggle with heavily overlapping speech
44
- - Requires Apple devices with CoreML support
45
-
46
- ### Technical Specifications
47
- - **Input**: 16kHz mono audio
48
- - **Output**: Speaker segments with timestamps and IDs
49
- - **Framework**: CoreML (converted from PyTorch)
50
- - **Optimization**: Apple Neural Engine (ANE) optimized operations
51
- - **Precision**: FP32
52
-
53
- ## Training Data
54
-
55
- These models are converted from open-source variants trained on diverse speaker diarization datasets. The original models were trained on:
56
- - Multi-speaker conversation datasets
57
- - Various acoustic conditions
58
- - Multiple languages and accents
59
-
60
- *Note: Specific training data details depend on the original open-source model variant.*
61
-
62
  ## Usage
63
 
64
  See the SDK for more details [https://github.com/FluidInference/FluidAudio](https://github.com/FluidInference/FluidAudio)
@@ -116,3 +77,41 @@ pyannote-audio - State-of-the-art diarization research
116
  wespeaker - Speaker embedding techniques
117
 
118
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  This repository contains CoreML-optimized speaker diarization models specifically converted and optimized for Apple devices (macOS 13.0+, iOS 16.0+). These models enable efficient on-device speaker diarization with minimal power consumption while maintaining state-of-the-art accuracy.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ## Usage
24
 
25
  See the SDK for more details [https://github.com/FluidInference/FluidAudio](https://github.com/FluidInference/FluidAudio)
 
77
  wespeaker - Speaker embedding techniques
78
 
79
 
80
+ ### Key Features
81
+ - **Apple Neural Engine Optimized**: Zero performance trade-offs with maximum efficiency
82
+ - **Real-time Processing**: RTF of 0.02x (50x faster than real-time)
83
+ - **Research-Competitive**: DER of 17.7% on AMI benchmark
84
+ - **Power Efficient**: Designed for maximum performance per watt
85
+ - **Privacy-First**: All processing happens on-device
86
+
87
+
88
+ ## Intended Uses & Limitations
89
+
90
+ ### Intended Uses
91
+ - **Meeting Transcription**: Real-time speaker identification in meetings
92
+ - **Voice Assistants**: Multi-speaker conversation understanding
93
+ - **Media Production**: Automated speaker labeling for podcasts/interviews
94
+ - **Research**: Academic research in speaker diarization
95
+ - **Privacy-Focused Applications**: On-device processing without cloud dependencies
96
+
97
+ ### Limitations
98
+ - Optimized for 16kHz audio input
99
+ - Best performance with clear audio (no heavy background noise)
100
+ - May struggle with heavily overlapping speech
101
+ - Requires Apple devices with CoreML support
102
+
103
+ ### Technical Specifications
104
+ - **Input**: 16kHz mono audio
105
+ - **Output**: Speaker segments with timestamps and IDs
106
+ - **Framework**: CoreML (converted from PyTorch)
107
+ - **Optimization**: Apple Neural Engine (ANE) optimized operations
108
+ - **Precision**: FP32
109
+
110
+ ## Training Data
111
+
112
+ These models are converted from open-source variants trained on diverse speaker diarization datasets. The original models were trained on:
113
+ - Multi-speaker conversation datasets
114
+ - Various acoustic conditions
115
+ - Multiple languages and accents
116
+
117
+ *Note: Specific training data details depend on the original open-source model variant.*