alkiskoudounas
/

hubert-base-unslurp-gold

Audio Classification

intent-classification

Model card Files Files and versions Community

hubert-base-unslurp-gold / README.md

alkiskoudounas's picture

Updated README

e8f432f verified 2 months ago

|

history blame contribute delete

2.77 kB

	---
	license: apache-2.0
	base_model:
	- facebook/hubert-base-ls960
	tags:
	- intent-classification
	- slu
	- audio-classification
	metrics:
	- accuracy
	- f1
	model-index:
	- name: hubert-base-unslurp-gold
	results: []
	datasets:
	- unslurp
	language:
	- en
	pipeline_tag: audio-classification
	library_name: transformers
	---

	# HuBERT-base-UNSLURP-GOLD (Retain Set)

	This model is a fine-tuned version of [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960) on the UNSLURP dataset (retain set) for the intent classification task.
	SLURP does not provide speaker-independent splits, which are, however, required by Machine Unlearning techniques to be effective. In fact, the identities present in
	the retain, forget, and test sets must be exclusive to successfully apply and evaluate unlearning methods. To address this, we propose new speaker-independent splits.
	In the following, we refer to the new dataset as SLURP*, or UNSLURP.

	It achieves the following results on the test set:
	- Accuracy: 0.826
	- F1: 0.704

	## Model description

	The base [Facebook's Hubert](https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression) model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

	## Task and dataset description

	Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers.
	The dataset used here is [(UN)SLURP](https://arxiv.org/abs/2011.13205), where each utterance is tagged with two intent labels: action and scenario.

	## Usage examples

	You can use the model directly in the following manner:
	```python
	import torch
	import librosa
	from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

	## Load an audio file
	audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)

	## Load model and feature extractor
	model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/hubert-base-unslurp-gold")
	feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/hubert-base-ls960")

	## Extract features
	inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")

	## Compute logits
	logits = model(**inputs).logits
	```

	## Framework versions

	- Datasets 3.2.0
	- Pytorch 2.1.2
	- Tokenizers 0.20.3
	- Transformers 4.45.2

	## BibTeX entry and citation info

	```bibtex
	@inproceedings{koudounas2025unlearning,
	title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding},
	author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena},
	booktitle={Proc. Interspeech 2025},
	year={2025},
	}
	```