alkiskoudounas commited on
Commit
62fa459
·
verified ·
1 Parent(s): 38d293c

Created README

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - facebook/wav2vec2-base
5
+ tags:
6
+ - intent-classification
7
+ - slu
8
+ - audio-classification
9
+ metrics:
10
+ - accuracy
11
+ - f1
12
+ model-index:
13
+ - name: wav2vec2-base-unslurp-gold
14
+ results: []
15
+ datasets:
16
+ - slurp
17
+ language:
18
+ - en
19
+ pipeline_tag: audio-classification
20
+ library_name: transformers
21
+ ---
22
+
23
+ # wav2vec2-base-UNSLURP-gold
24
+
25
+ This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the UNSLURP dataset (retain set) for the intent classification task.
26
+ SLURP does not provide speaker-independent splits, which are, however, required by Machine Unlearning techniques to be effective. In fact, the identities present in
27
+ the retain, forget, and test sets must be exclusive to successfully apply and evaluate unlearning methods. To address this, we propose new speaker-independent splits.
28
+ In the following, we refer to the new dataset as SLURP*, or UNSLURP.
29
+
30
+ It achieves the following results on the test set:
31
+ - Accuracy: 0.825
32
+ - F1: 0.707
33
+
34
+ ## Model description
35
+
36
+ The base [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
37
+
38
+ ## Task and dataset description
39
+
40
+ Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers.
41
+ The dataset used here is [(UN)SLURP](https://arxiv.org/abs/2011.13205), where each utterance is tagged with two intent labels: action and scenario.
42
+
43
+ ## Usage examples
44
+
45
+ You can use the model directly in the following manner:
46
+ ```python
47
+ import torch
48
+ import librosa
49
+ from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
50
+
51
+ ## Load an audio file
52
+ audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)
53
+
54
+ ## Load model and feature extractor
55
+ model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/wav2vec2-base-unslurp-gold")
56
+ feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
57
+
58
+ ## Extract features
59
+ inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")
60
+
61
+ ## Compute logits
62
+ logits = model(**inputs).logits
63
+ ```
64
+
65
+ ## Framework versions
66
+
67
+ - Datasets 3.2.0
68
+ - Pytorch 2.1.2
69
+ - Tokenizers 0.20.3
70
+ - Transformers 4.45.2
71
+
72
+ ## BibTeX entry and citation info
73
+
74
+ ```bibtex
75
+ @inproceedings{koudounas2025unlearning,
76
+ title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding},
77
+ author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena},
78
+ booktitle={Proc. Interspeech 2025},
79
+ year={2025},
80
+ }
81
+ ```