alkiskoudounas commited on
Commit
ce6d664
·
verified ·
1 Parent(s): e7bf015

Updated README

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -28,13 +28,23 @@ The pre-training datasets include: AudioSet (vocalization), FreeSound (babies),
28
 
29
  We evaluate voc2vec on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
30
 
 
 
 
 
 
 
 
 
 
31
  ## Available Models
32
 
33
  | Model | Description | Link |
34
  |--------|-------------|------|
35
  | **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
36
- | **voc2vec-as-pt** | Continues pre-training from a model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
37
- | **voc2vec-ls-pt** | Continues pre-training from a model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
 
38
 
39
  ## Usage examples
40
 
@@ -63,13 +73,12 @@ logits = model(**inputs).logits
63
  ```bibtex
64
  @INPROCEEDINGS{koudounas2025icassp,
65
  author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
66
- booktitle={ICASSP 2025 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
67
  title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
68
  year={2025},
69
  volume={},
70
  number={},
71
- pages={},
72
- keywords={},
73
- doi={}}
74
-
75
  ```
 
28
 
29
  We evaluate voc2vec on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
30
 
31
+ The following table reports the average performance in terms of Unweighted Average Recall (UAR) and F1 Macro across the six datasets described above.
32
+
33
+ | Model | Architecture | Pre-training DS | UAR | F1 Macro |
34
+ |--------|-------------|-------------|-----------|-----------|
35
+ | **voc2vec** | wav2vec 2.0 | Voc125 | .612±.212 | .580±.230 |
36
+ | **voc2vec-as-pt** | wav2vec 2.0 | AudioSet + Voc125 | .603±.183 | .574±.194 |
37
+ | **voc2vec-ls-pt** | wav2vec 2.0 | LibriSpeech + Voc125 | .661±.206 | .636±.223 |
38
+ | **voc2vec-hubert-ls-pt** | HuBERT | LibriSpeech + Voc125 | **.696±.189** | **.678±.200** |
39
+
40
  ## Available Models
41
 
42
  | Model | Description | Link |
43
  |--------|-------------|------|
44
  | **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
45
+ | **voc2vec-as-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
46
+ | **voc2vec-ls-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
47
+ | **voc2vec-hubert-ls-pt** | Continues pre-training from a hubert-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-hubert-ls-pt) |
48
 
49
  ## Usage examples
50
 
 
73
  ```bibtex
74
  @INPROCEEDINGS{koudounas2025icassp,
75
  author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
76
+ booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
77
  title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
78
  year={2025},
79
  volume={},
80
  number={},
81
+ pages={1-5},
82
+ keywords={Pediatrics;Accuracy;Foundation models;Benchmark testing;Signal processing;Data models;Acoustics;Speech processing;Nonverbal vocalization;Representation Learning;Self-Supervised Models;Pre-trained Models},
83
+ doi={10.1109/ICASSP49660.2025.10890672}}
 
84
  ```