alkiskoudounas's picture
Updated README
841ad88 verified
metadata
license: apache-2.0
base_model:
  - facebook/hubert-base-ls960
tags:
  - intent-classification
  - slu
  - audio-classification
datasets:
  - fluent-speech-commands
metrics:
  - accuracy
  - f1
model-index:
  - name: hubert-base-fsc-gold
    results: []
language:
  - en
pipeline_tag: audio-classification
library_name: transformers

HuBERT-base-FSC-GOLD (Retain Set)

This model is a fine-tuned version of facebook/hubert-base-ls960 on the FSC dataset (retain set) for the intent classification task.

It achieves the following results on the test set:

  • Accuracy: 0.990
  • F1: 0.991

Model description

The base Facebook's Hubert model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

Task and dataset description

Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers. The dataset used here is Fluent Speech Commands (FSC), where each utterance is tagged with three intent labels: action, object, and location.

Usage examples

You can use the model directly in the following manner:

import torch
import librosa
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

## Load an audio file
audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)

## Load model and feature extractor
model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/hubert-base-fsc-gold")
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/hubert-base-ls960")

## Extract features
inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")

## Compute logits
logits = model(**inputs).logits

Framework versions

  • Datasets 3.2.0
  • Pytorch 2.1.2
  • Tokenizers 0.20.3
  • Transformers 4.45.2

BibTeX entry and citation info

@inproceedings{koudounas2025unlearning,
  title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding},
  author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena},
  booktitle={Proc. Interspeech 2025}, 
  year={2025},
}