|
--- |
|
license: apache-2.0 |
|
task_categories: |
|
- audio-classification |
|
language: |
|
- it |
|
tags: |
|
- intent |
|
- intent-classification |
|
- audio-classification |
|
- audio |
|
pretty_name: ITALIC |
|
size_categories: |
|
- 10K<n<100K |
|
base_model: |
|
- jonatasgrosman/wav2vec2-large-xlsr-53-italian |
|
model-index: |
|
- name: xls-r-53-it-italic-speaker |
|
results: [] |
|
datasets: |
|
- RiTA-nlp/ITALIC |
|
library_name: transformers |
|
--- |
|
|
|
# wav2vec 2.0 XLS-R 53-IT (300m) fine-tuned on ITALIC - "Hard Speaker" |
|
|
|
ITALIC is an intent classification dataset for the Italian language, which is the first of its kind. |
|
It includes spoken and written utterances and is annotated with 60 intents. |
|
The dataset is available on [Zenodo](https://zenodo.org/record/8040649) and connectors ara available for the [HuggingFace Hub](https://huggingface.co/datasets/RiTA-nlp/ITALIC). |
|
|
|
This is the [jonatasgrosman/wav2vec2-xls-r-53-IT](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-italian) model fine-tuned on the "Hard Speaker" split. |
|
|
|
It achieves the following results on the test set: |
|
|
|
- Accuracy: 0.837 |
|
- F1: 0.778 |
|
|
|
## Usage |
|
|
|
You can use the model directly in the following manner: |
|
|
|
```python |
|
import torch |
|
import librosa |
|
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor |
|
|
|
## Load an audio file |
|
audio_array, sr = librosa.load("path_to_audio.wav", sr=16000) |
|
|
|
## Load model and feature extractor |
|
model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/xls-r-53-it-italic-speaker") |
|
feature_extractor = AutoFeatureExtractor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-italian") |
|
|
|
## Extract features |
|
inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt") |
|
|
|
## Compute logits |
|
logits = model(**inputs).logits |
|
``` |
|
|
|
For more information about the dataset, please refer to the [paper](https://arxiv.org/abs/2306.08502). |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite the following papers: |
|
|
|
```bibtex |
|
@inproceedings{koudounas2023italic, |
|
title={ITALIC: An Italian Intent Classification Dataset}, |
|
author={Koudounas, Alkis and La Quatra, Moreno and Vaiani, Lorenzo and Colomba, Luca and Attanasio, Giuseppe and Pastor, Eliana and Cagliero, Luca and Baralis, Elena}, |
|
booktitle={Proc. Interspeech 2023}, |
|
pages={2153--2157}, |
|
year={2023} |
|
} |
|
|
|
@inproceedings{koudounas2025unlearning, |
|
title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding}, |
|
author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena}, |
|
booktitle={Proc. Interspeech 2025}, |
|
year={2025}, |
|
} |
|
``` |