Update README.md
Browse files
README.md
CHANGED
@@ -51,7 +51,65 @@ python infer.py
|
|
51 |
- 14k audio hours
|
52 |
- English only
|
53 |
|
54 |
-
Dataset is
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
### Citation
|
57 |
|
@@ -67,6 +125,18 @@ Dataset is open available in [HF Dataset](https://huggingface.co/datasets/nguyen
|
|
67 |
keywords={Training;Adaptation models;Limiting;Predictive models;Data models;Robustness;Multilingual;Data mining;Speech processing;Standards;speaker-attributed;asr;multilingual},
|
68 |
doi={10.1109/ICASSP49660.2025.10889116}}
|
69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
```
|
71 |
|
72 |
### License
|
|
|
51 |
- 14k audio hours
|
52 |
- English only
|
53 |
|
54 |
+
Dataset is openly available in [HF Dataset](https://huggingface.co/datasets/nguyenvulebinh/spk-attribute)
|
55 |
+
|
56 |
+
*Example*
|
57 |
+
|
58 |
+
Audio
|
59 |
+
|
60 |
+
<audio controls>
|
61 |
+
<source src="https://huggingface.co/nguyenvulebinh/MSA-ASR/resolve/main/sample_augment.wav" type="audio/wav">
|
62 |
+
Your browser does not support the audio element.
|
63 |
+
</audio>
|
64 |
+
|
65 |
+
|
66 |
+
Label:
|
67 |
+
|
68 |
+
```code
|
69 |
+
spk_1 A 0.00 1.58 »spk_1
|
70 |
+
spk_1 A 0.00 1.58 Pacifica
|
71 |
+
spk_1 A 1.58 0.68 continues
|
72 |
+
spk_1 A 2.27 0.52 today
|
73 |
+
spk_1 A 2.79 0.24 to
|
74 |
+
spk_1 A 3.03 0.20 be
|
75 |
+
spk_1 A 3.23 0.14 a
|
76 |
+
spk_1 A 3.37 0.54 listener
|
77 |
+
spk_1 A 3.91 0.80 supported
|
78 |
+
spk_1 A 4.71 0.70 network
|
79 |
+
spk_1 A 5.42 0.38 of
|
80 |
+
spk_2 A 5.80 0.12 »spk_2
|
81 |
+
spk_2 A 5.80 0.12 At
|
82 |
+
spk_2 A 5.92 0.42 home,
|
83 |
+
spk_2 A 6.34 0.18 an
|
84 |
+
spk_2 A 6.52 0.38 Aed
|
85 |
+
spk_2 A 6.90 0.26 is
|
86 |
+
spk_2 A 7.16 0.18 an
|
87 |
+
spk_2 A 7.34 0.56 automated
|
88 |
+
spk_2 A 7.90 0.60 external
|
89 |
+
spk_2 A 8.50 0.90 defibrillator.
|
90 |
+
spk_2 A 9.40 0.40 It's
|
91 |
+
spk_2 A 9.81 0.08 the
|
92 |
+
spk_2 A 9.89 0.36 device
|
93 |
+
spk_2 A 10.25 0.08 you
|
94 |
+
spk_2 A 10.33 0.16 use
|
95 |
+
spk_2 A 10.49 0.12 when
|
96 |
+
spk_2 A 10.61 0.10 your
|
97 |
+
spk_2 A 10.73 0.16 heart
|
98 |
+
spk_2 A 10.89 0.18 goes
|
99 |
+
spk_2 A 11.07 0.12 into
|
100 |
+
spk_2 A 11.19 0.38 cardiac
|
101 |
+
spk_2 A 11.57 0.38 arrest
|
102 |
+
spk_2 A 11.95 0.18 to
|
103 |
+
spk_2 A 12.13 0.36 shock
|
104 |
+
spk_2 A 12.49 0.14 it
|
105 |
+
spk_2 A 12.63 0.28 back
|
106 |
+
spk_2 A 12.91 0.22 into
|
107 |
+
spk_2 A 13.13 0.06 a
|
108 |
+
spk_2 A 13.19 0.32 normal
|
109 |
+
spk_2 A 13.51 0.88 rhythm.
|
110 |
+
spk_1 A 14.40 1.38 »spk_1
|
111 |
+
spk_1 A 14.40 1.38 stations.
|
112 |
+
```
|
113 |
|
114 |
### Citation
|
115 |
|
|
|
125 |
keywords={Training;Adaptation models;Limiting;Predictive models;Data models;Robustness;Multilingual;Data mining;Speech processing;Standards;speaker-attributed;asr;multilingual},
|
126 |
doi={10.1109/ICASSP49660.2025.10889116}}
|
127 |
|
128 |
+
@INPROCEEDINGS{10446589,
|
129 |
+
author={Nguyen, Thai-Binh and Waibel, Alexander},
|
130 |
+
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
|
131 |
+
title={Synthetic Conversations Improve Multi-Talker ASR},
|
132 |
+
year={2024},
|
133 |
+
volume={},
|
134 |
+
number={},
|
135 |
+
pages={10461-10465},
|
136 |
+
keywords={Systematics;Error analysis;Knowledge based systems;Oral communication;Signal processing;Data models;Acoustics;multi-talker;asr;synthetic conversation},
|
137 |
+
doi={10.1109/ICASSP48485.2024.10446589}}
|
138 |
+
|
139 |
+
|
140 |
```
|
141 |
|
142 |
### License
|