Updated README
Browse files
README.md
CHANGED
@@ -28,13 +28,23 @@ The pre-training datasets include: AudioSet (vocalization), FreeSound (babies),
|
|
28 |
|
29 |
We evaluate voc2vec on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## Available Models
|
32 |
|
33 |
| Model | Description | Link |
|
34 |
|--------|-------------|------|
|
35 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
36 |
-
| **voc2vec-as-pt** | Continues pre-training from a model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
37 |
-
| **voc2vec-ls-pt** | Continues pre-training from a model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
|
|
38 |
|
39 |
## Usage examples
|
40 |
|
@@ -63,13 +73,12 @@ logits = model(**inputs).logits
|
|
63 |
```bibtex
|
64 |
@INPROCEEDINGS{koudounas2025icassp,
|
65 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
66 |
-
booktitle={ICASSP 2025 -
|
67 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
68 |
year={2025},
|
69 |
volume={},
|
70 |
number={},
|
71 |
-
pages={},
|
72 |
-
keywords={},
|
73 |
-
doi={}}
|
74 |
-
|
75 |
```
|
|
|
28 |
|
29 |
We evaluate voc2vec on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
|
30 |
|
31 |
+
The following table reports the average performance in terms of Unweighted Average Recall (UAR) and F1 Macro across the six datasets described above.
|
32 |
+
|
33 |
+
| Model | Architecture | Pre-training DS | UAR | F1 Macro |
|
34 |
+
|--------|-------------|-------------|-----------|-----------|
|
35 |
+
| **voc2vec** | wav2vec 2.0 | Voc125 | .612±.212 | .580±.230 |
|
36 |
+
| **voc2vec-as-pt** | wav2vec 2.0 | AudioSet + Voc125 | .603±.183 | .574±.194 |
|
37 |
+
| **voc2vec-ls-pt** | wav2vec 2.0 | LibriSpeech + Voc125 | .661±.206 | .636±.223 |
|
38 |
+
| **voc2vec-hubert-ls-pt** | HuBERT | LibriSpeech + Voc125 | **.696±.189** | **.678±.200** |
|
39 |
+
|
40 |
## Available Models
|
41 |
|
42 |
| Model | Description | Link |
|
43 |
|--------|-------------|------|
|
44 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
45 |
+
| **voc2vec-as-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
46 |
+
| **voc2vec-ls-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
47 |
+
| **voc2vec-hubert-ls-pt** | Continues pre-training from a hubert-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-hubert-ls-pt) |
|
48 |
|
49 |
## Usage examples
|
50 |
|
|
|
73 |
```bibtex
|
74 |
@INPROCEEDINGS{koudounas2025icassp,
|
75 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
76 |
+
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
|
77 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
78 |
year={2025},
|
79 |
volume={},
|
80 |
number={},
|
81 |
+
pages={1-5},
|
82 |
+
keywords={Pediatrics;Accuracy;Foundation models;Benchmark testing;Signal processing;Data models;Acoustics;Speech processing;Nonverbal vocalization;Representation Learning;Self-Supervised Models;Pre-trained Models},
|
83 |
+
doi={10.1109/ICASSP49660.2025.10890672}}
|
|
|
84 |
```
|