Updated README
Browse files
README.md
CHANGED
@@ -30,13 +30,23 @@ This model continues pre-training from a [model](https://huggingface.co/ALM/wav2
|
|
30 |
|
31 |
We evaluate voc2vec-as-pt on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
## Available Models
|
34 |
|
35 |
| Model | Description | Link |
|
36 |
|--------|-------------|------|
|
37 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
38 |
-
| **voc2vec-as-pt** | Continues pre-training from a model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
39 |
-
| **voc2vec-ls-pt** | Continues pre-training from a model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
|
|
40 |
|
41 |
## Usage examples
|
42 |
|
@@ -65,13 +75,12 @@ logits = model(**inputs).logits
|
|
65 |
```bibtex
|
66 |
@INPROCEEDINGS{koudounas2025icassp,
|
67 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
68 |
-
booktitle={ICASSP 2025 -
|
69 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
70 |
year={2025},
|
71 |
volume={},
|
72 |
number={},
|
73 |
-
pages={},
|
74 |
-
keywords={},
|
75 |
-
doi={}}
|
76 |
-
|
77 |
```
|
|
|
30 |
|
31 |
We evaluate voc2vec-as-pt on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
|
32 |
|
33 |
+
The following table reports the average performance in terms of Unweighted Average Recall (UAR) and F1 Macro across the six datasets described above.
|
34 |
+
|
35 |
+
| Model | Architecture | Pre-training DS | UAR | F1 Macro |
|
36 |
+
|--------|-------------|-------------|-----------|-----------|
|
37 |
+
| **voc2vec** | wav2vec 2.0 | Voc125 | .612±.212 | .580±.230 |
|
38 |
+
| **voc2vec-as-pt** | wav2vec 2.0 | AudioSet + Voc125 | .603±.183 | .574±.194 |
|
39 |
+
| **voc2vec-ls-pt** | wav2vec 2.0 | LibriSpeech + Voc125 | .661±.206 | .636±.223 |
|
40 |
+
| **voc2vec-hubert-ls-pt** | HuBERT | LibriSpeech + Voc125 | **.696±.189** | **.678±.200** |
|
41 |
+
|
42 |
## Available Models
|
43 |
|
44 |
| Model | Description | Link |
|
45 |
|--------|-------------|------|
|
46 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
47 |
+
| **voc2vec-as-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
48 |
+
| **voc2vec-ls-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
49 |
+
| **voc2vec-hubert-ls-pt** | Continues pre-training from a hubert-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-hubert-ls-pt) |
|
50 |
|
51 |
## Usage examples
|
52 |
|
|
|
75 |
```bibtex
|
76 |
@INPROCEEDINGS{koudounas2025icassp,
|
77 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
78 |
+
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
|
79 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
80 |
year={2025},
|
81 |
volume={},
|
82 |
number={},
|
83 |
+
pages={1-5},
|
84 |
+
keywords={Pediatrics;Accuracy;Foundation models;Benchmark testing;Signal processing;Data models;Acoustics;Speech processing;Nonverbal vocalization;Representation Learning;Self-Supervised Models;Pre-trained Models},
|
85 |
+
doi={10.1109/ICASSP49660.2025.10890672}}
|
|
|
86 |
```
|