alkiskoudounas commited on
Commit
b8bf265
·
verified ·
1 Parent(s): b96fc0f

Created README

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - facebook/wav2vec2-base
5
+ tags:
6
+ - intent-classification
7
+ - slu
8
+ - audio-classification
9
+ metrics:
10
+ - accuracy
11
+ - f1
12
+ model-index:
13
+ - name: wav2vec2-base-fsc-gold
14
+ results: []
15
+ datasets:
16
+ - fsc
17
+ language:
18
+ - en
19
+ pipeline_tag: audio-classification
20
+ library_name: transformers
21
+ ---
22
+
23
+ # wav2vec2-base-FSC-gold
24
+
25
+ This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the FSC dataset (retain set) for the intent classification task.
26
+
27
+ It achieves the following results on the test set:
28
+ - Accuracy: 0.992
29
+ - F1: 0.993
30
+
31
+ ## Model description
32
+
33
+ The base [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
34
+
35
+ ## Task and dataset description
36
+
37
+ Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers.
38
+ The dataset used here is [Fluent Speech Commands (FSC)](https://arxiv.org/pdf/1904.03670), where each utterance is tagged with three intent labels: action, object, and location.
39
+
40
+ ## Usage examples
41
+
42
+ You can use the model directly in the following manner:
43
+ ```python
44
+ import torch
45
+ import librosa
46
+ from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
47
+
48
+ ## Load an audio file
49
+ audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)
50
+
51
+ ## Load model and feature extractor
52
+ model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/wav2vec2-base-fsc-gold")
53
+ feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
54
+
55
+ ## Extract features
56
+ inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")
57
+
58
+ ## Compute logits
59
+ logits = model(**inputs).logits
60
+ ```
61
+
62
+ ## Framework versions
63
+
64
+ - Datasets 3.2.0
65
+ - Pytorch 2.1.2
66
+ - Tokenizers 0.20.3
67
+ - Transformers 4.45.2
68
+
69
+ ## BibTeX entry and citation info
70
+
71
+ ```bibtex
72
+ @inproceedings{koudounas2025unlearning,
73
+ title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding},
74
+ author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena},
75
+ booktitle={Proc. Interspeech 2025},
76
+ year={2025},
77
+ }
78
+ ```