|
--- |
|
language: |
|
- en |
|
--- |
|
|
|
End-to-end Neural Diarization (EEND) trained on AMI-headset dataset. |
|
This example could be found at `egs2/ami/diar1`. |
|
|
|
## Configurations: |
|
|
|
- Use ESPNet's default frontend to extract features. The sampling rate is 8000 Hz, with a frame length of 25 ms and a frame shift of 10 ms. The frontend extracts 23 log-scaled Mel-filterbanks. |
|
- Follow the frame concatenation and subsampling strategy described in paper [[2]]. Each frame is concatenated with the preceding and following 7 frames, followed by subsampling with a factor of 10. As a result, a 345-dimensional acoustic feature (23 × 15) is extracted for each 100 ms. |
|
- Training and testing are performed exclusively on data with 4 speakers. |
|
- Use 4 layer stacked Transformer encoder, each outputs 256-dimensional frame-wise embeddings. |
|
- The training process spans 500 epochs. |
|
- Detailed configurations are defined in `exp/diar/train_diar_diar_raw/config.yaml`. |
|
|
|
## RESULTS |
|
### Environments |
|
- date: `Thu Dec 19 22:03:53 EST 2024` |
|
- python version: `3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0]` |
|
- espnet version: `espnet 202409` |
|
- pytorch version: `pytorch 2.4.0` |
|
- Git hash: `c12b3d59ca4fd8847edf274e56a1716474d2a30e` |
|
- Commit date: `Thu Dec 19 21:58:26 2024 -0500` |
|
|
|
### diar_train_diar_raw |
|
#### DER |
|
diarized_test |
|
|threshold_median_collar|DER| |
|
|---|---| |
|
|result_th0.3_med11_collar0.0|71.73| |
|
|result_th0.3_med1_collar0.0|74.62| |
|
|result_th0.4_med11_collar0.0|70.10| |
|
|result_th0.4_med1_collar0.0|71.98| |
|
|result_th0.5_med11_collar0.0|70.57| |
|
|result_th0.5_med1_collar0.0|72.44| |
|
|result_th0.6_med11_collar0.0|72.64| |
|
|result_th0.6_med1_collar0.0|74.63| |
|
|result_th0.7_med11_collar0.0|76.52| |
|
|result_th0.7_med1_collar0.0|78.41| |
|
### diar_train_diar_raw |
|
#### DER |
|
diarized_dev |
|
|threshold_median_collar|DER| |
|
|---|---| |
|
|result_th0.3_med11_collar0.0|75.88| |
|
|result_th0.3_med1_collar0.0|78.21| |
|
|result_th0.4_med11_collar0.0|71.45| |
|
|result_th0.4_med1_collar0.0|73.32| |
|
|result_th0.5_med11_collar0.0|70.53| |
|
|result_th0.5_med1_collar0.0|72.34| |
|
|result_th0.6_med11_collar0.0|72.03| |
|
|result_th0.6_med1_collar0.0|73.96| |
|
|result_th0.7_med11_collar0.0|76.66| |
|
|result_th0.7_med1_collar0.0|78.33| |
|
|