nguyenvulebinh commited on
Commit
2d8c1a6
·
verified ·
1 Parent(s): e2e0dbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -10
README.md CHANGED
@@ -5,13 +5,17 @@ tags: []
5
  # MSA-ASR
6
  Multilingual Speaker-Attributed Automatic Speech Recognition
7
 
 
 
 
 
8
  ### Introduction
9
 
10
  This repository provides an implementation of a Speaker-Attributed Automatic Speech Recognition model. The model performs both multilingual speech recognition and speaker embedding extraction, enabling speaker differentiation.
11
 
12
  Model architecture
13
 
14
- ![MSA-ASR Model](https://github.com/nguyenvulebinh/MSA-ASR/blob/679f7016c1b0610c5ae5f85fae2168096491b464/resource/model.png?raw=true)
15
 
16
 
17
  ### Setup
@@ -30,18 +34,39 @@ Test script:
30
  python infer.py
31
  ```
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ### Citation
34
 
35
  ```bibtex
36
- @misc{nguyen2025msaasrefficientmultilingualspeaker,
37
- title={MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models},
38
- author={Thai-Binh Nguyen and Alexander Waibel},
39
- year={2025},
40
- eprint={2411.18152},
41
- archivePrefix={arXiv},
42
- primaryClass={cs.CL},
43
- url={https://arxiv.org/abs/2411.18152},
44
- }
 
 
45
  ```
46
 
47
  ### License
 
5
  # MSA-ASR
6
  Multilingual Speaker-Attributed Automatic Speech Recognition
7
 
8
+ ### Demo
9
+
10
+ <video src="https://huggingface.co/nguyenvulebinh/MSA-ASR/resolve/main/demo_sa-asr.mp4" width="640" height="480" controls></video>
11
+
12
  ### Introduction
13
 
14
  This repository provides an implementation of a Speaker-Attributed Automatic Speech Recognition model. The model performs both multilingual speech recognition and speaker embedding extraction, enabling speaker differentiation.
15
 
16
  Model architecture
17
 
18
+ ![MSA-ASR Model](https://github.com/nguyenvulebinh/MSA-ASR/blob/main/resource/model.png?raw=true)
19
 
20
 
21
  ### Setup
 
34
  python infer.py
35
  ```
36
 
37
+ ### Training Dataset
38
+
39
+ *From ASR to SA-ASR dataset:*
40
+
41
+ - Segment ASR data into single-speaker turns.
42
+ - Match turns into group which may come from the same speaker by using speaker embedding cosine similarity.
43
+ - Pick a few groups, each group a few turns.
44
+ - Concatenate turns in random order.
45
+
46
+ ![MSA-ASR Dataset](https://github.com/nguyenvulebinh/MSA-ASR/blob/main/resource/sa_asr_data_pipeline.png?raw=true)
47
+
48
+ *In total:*
49
+
50
+ - 15.5M turns
51
+ - 14k audio hours
52
+ - English only
53
+
54
+ Dataset is open available in [HF Dataset](https://huggingface.co/datasets/nguyenvulebinh/spk-attribute)
55
+
56
  ### Citation
57
 
58
  ```bibtex
59
+ @INPROCEEDINGS{10889116,
60
+ author={Nguyen, Thai-Binh and Waibel, Alexander},
61
+ booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
62
+ title={MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models},
63
+ year={2025},
64
+ volume={},
65
+ number={},
66
+ pages={1-5},
67
+ keywords={Training;Adaptation models;Limiting;Predictive models;Data models;Robustness;Multilingual;Data mining;Speech processing;Standards;speaker-attributed;asr;multilingual},
68
+ doi={10.1109/ICASSP49660.2025.10889116}}
69
+
70
  ```
71
 
72
  ### License