alkiskoudounas
/

xls-r-53-it-italic-speaker

Audio Classification

intent-classification

Model card Files Files and versions Community

xls-r-53-it-italic-speaker / README.md

alkiskoudounas's picture

Updated README

b4cefe2 verified 2 months ago

|

2.62 kB

	---
	license: apache-2.0
	task_categories:
	- audio-classification
	language:
	- it
	tags:
	- intent
	- intent-classification
	- audio-classification
	- audio
	pretty_name: ITALIC
	size_categories:
	- 10K<n<100K
	base_model:
	- jonatasgrosman/wav2vec2-large-xlsr-53-italian
	model-index:
	- name: xls-r-53-it-italic-speaker
	results: []
	datasets:
	- RiTA-nlp/ITALIC
	library_name: transformers
	---

	# wav2vec 2.0 XLS-R 53-IT (300m) fine-tuned on ITALIC - "Hard Speaker"

	ITALIC is an intent classification dataset for the Italian language, which is the first of its kind.
	It includes spoken and written utterances and is annotated with 60 intents.
	The dataset is available on [Zenodo](https://zenodo.org/record/8040649) and connectors ara available for the [HuggingFace Hub](https://huggingface.co/datasets/RiTA-nlp/ITALIC).

	This is the [jonatasgrosman/wav2vec2-xls-r-53-IT](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-italian) model fine-tuned on the "Hard Speaker" split.

	It achieves the following results on the test set:

	- Accuracy: 0.837
	- F1: 0.778

	## Usage

	You can use the model directly in the following manner:

	```python
	import torch
	import librosa
	from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

	## Load an audio file
	audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)

	## Load model and feature extractor
	model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/xls-r-53-it-italic-speaker")
	feature_extractor = AutoFeatureExtractor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-italian")

	## Extract features
	inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")

	## Compute logits
	logits = model(**inputs).logits
	```

	For more information about the dataset, please refer to the [paper](https://arxiv.org/abs/2306.08502).

	## Citation

	If you use this model in your research, please cite the following papers:

	```bibtex
	@inproceedings{koudounas2023italic,
	title={ITALIC: An Italian Intent Classification Dataset},
	author={Koudounas, Alkis and La Quatra, Moreno and Vaiani, Lorenzo and Colomba, Luca and Attanasio, Giuseppe and Pastor, Eliana and Cagliero, Luca and Baralis, Elena},
	booktitle={Proc. Interspeech 2023},
	pages={2153--2157},
	year={2023}
	}

	@inproceedings{koudounas2025unlearning,
	title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding},
	author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena},
	booktitle={Proc. Interspeech 2025},
	year={2025},
	}
	```