license: cc-by-4.0
metrics:
- accuracy
- f1
- uar
pipeline_tag: audio-classification
tags:
- audio
- audio-classification
- acoustic-scene-classification
- autrainer
library_name: autrainer
model-index:
- name: dcase-2020-t1a-cnn14-32k-t
results:
- task:
type: audio-classification
name: Acoustic Scene Classification
metrics:
- type: accuracy
name: Accuracy
value: 0.6778975741239892
- type: f1
name: F1
value: 0.6749168062342605
- type: uar
name: Unweighted Average Recall
value: 0.6778357903357903
Acoustic Scene Classification Model
CNN14
model from the PANN family that classifies audio files into one of the following 10 different acoustic scenes:
airport, bus, metro, metro_station, park, public_square, shopping_mall, street_pedestrian, street_traffic, and tram.
Installation
To use the model, you have to install autrainer, e.g. via pip:
pip install autrainer
Usage
The model can be applied on all audio files present in a folder (<data-root>
) and stores the predictions in another folder (<output-root>
):
autrainer inference hf:autrainer/dcase2020-t1a-cnn14-32k-t <data-root> <output-root>
Training
Pretraining
The model has been originally trained on AudioSet by Kong et. al..
Dataset
The model has been further trained (finetuned) on the training set of the DCASE 2020 Task 1A dataset. The dataset comprises 10 different acoustic scenes recorded in 12 European cities with real and simulated devices. The audio recordings were provided as 10-second segments with a sample rate of 48 kHz.
Features
The DCASE 2020 Task 1A dataset was resampled to 32 kHz, as this was the sampling rate of AudioSet, which the model was pretrained on. Then, log-Mel spectrograms were extracted with torchlibrosa using the parameters that the upstream model was trained on.
Training Process
The model has been trained for 50 epochs.
At the end of each epoch, the model was evaluated on the validation set.
We release the state that achieved the best performance on this validation set.
All training hyperparameters can be found in the main configuration file (conf/config.yaml
).
Evaluation
No public test set is provided for the DCASE 2020 Task 1A dataset. Therefore, we evaluate the model on the validation set. The model achieves a classification accuracy of 0.67 on the validation set.
Acknowledgements
Please acknowledge the work which produced the original model and the DCASE 2020 Task 1A dataset. We would also appreciate an acknowledgment to autrainer.