--- license: apache-2.0 base_model: - facebook/wav2vec2-base tags: - intent-classification - slu - audio-classification metrics: - accuracy - f1 model-index: - name: wav2vec2-base-fsc-gold results: [] datasets: - fsc language: - en pipeline_tag: audio-classification library_name: transformers --- # wav2vec2-base-FSC-GOLD (Retain Set) This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the FSC dataset (retain set) for the intent classification task. It achieves the following results on the test set: - Accuracy: 0.992 - F1: 0.993 ## Model description The base [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. ## Task and dataset description Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers. The dataset used here is [Fluent Speech Commands (FSC)](https://arxiv.org/pdf/1904.03670), where each utterance is tagged with three intent labels: action, object, and location. ## Usage examples You can use the model directly in the following manner: ```python import torch import librosa from transformers import AutoModelForAudioClassification, AutoFeatureExtractor ## Load an audio file audio_array, sr = librosa.load("path_to_audio.wav", sr=16000) ## Load model and feature extractor model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/wav2vec2-base-fsc-gold") feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base") ## Extract features inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt") ## Compute logits logits = model(**inputs).logits ``` ## Framework versions - Datasets 3.2.0 - Pytorch 2.1.2 - Tokenizers 0.20.3 - Transformers 4.45.2 ## BibTeX entry and citation info ```bibtex @inproceedings{koudounas2025unlearning, title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding}, author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena}, booktitle={Proc. Interspeech 2025}, year={2025}, } ```