autrainer
/

dcase-2020-t1a-cnn14-32k-t

Audio Classification

acoustic-scene-classification

Model card Files Files and versions Community

dcase-2020-t1a-cnn14-32k-t / README.md

ramppdev's picture

fix model-index name

8437873 9 months ago

|

history blame contribute delete

2.96 kB

	---
	license: cc-by-4.0
	metrics:
	- accuracy
	- f1
	- uar
	pipeline_tag: audio-classification
	tags:
	- audio
	- audio-classification
	- acoustic-scene-classification
	- autrainer
	library_name: autrainer
	model-index:
	- name: dcase-2020-t1a-cnn14-32k-t
	results:
	- task:
	type: audio-classification
	name: Acoustic Scene Classification
	metrics:
	- type: accuracy
	name: Accuracy
	value: 0.6778975741239892
	- type: f1
	name: F1
	value: 0.6749168062342605
	- type: uar
	name: Unweighted Average Recall
	value: 0.6778357903357903
	---

	# Acoustic Scene Classification Model

	`CNN14` model from the [PANN](https://zenodo.org/records/3987831) family that classifies audio files into one of the following 10 different acoustic scenes:
	_airport_, _bus_, _metro_, _metro_station_, _park_, _public_square_, _shopping_mall_, _street_pedestrian_, _street_traffic_, and _tram_.

	## Installation

	To use the model, you have to install autrainer, e.g. via pip:

	```bash
	pip install autrainer
	```

	## Usage

	The model can be applied on all audio files present in a folder (`<data-root>`) and stores the predictions in another folder (`<output-root>`):

	```bash
	autrainer inference hf:autrainer/dcase2020-t1a-cnn14-32k-t <data-root> <output-root>
	```

	## Training

	### Pretraining

	The model has been originally trained on AudioSet by [Kong et. al.](https://zenodo.org/records/3987831).

	### Dataset

	The model has been further trained (finetuned) on the training set of the [DCASE 2020 Task 1A](http://dcase.community/challenge2020/task-acoustic-scene-classification) dataset.
	The dataset comprises 10 different acoustic scenes recorded in 12 European cities with real and simulated devices.
	The audio recordings were provided as 10-second segments with a sample rate of 48 kHz.

	### Features

	The DCASE 2020 Task 1A dataset was resampled to 32 kHz, as this was the sampling rate of AudioSet, which the model was pretrained on.
	Then, log-Mel spectrograms were extracted with torchlibrosa using the parameters that the upstream model was trained on.

	### Training Process

	The model has been trained for 50 epochs.
	At the end of each epoch, the model was evaluated on the validation set.
	We release the state that achieved the best performance on this validation set.
	All training hyperparameters can be found in the main configuration file (`conf/config.yaml`).

	### Evaluation

	No public test set is provided for the DCASE 2020 Task 1A dataset.
	Therefore, we evaluate the model on the validation set.
	The model achieves a classification accuracy of 0.67 on the validation set.

	### Acknowledgements

	Please acknowledge the work which produced the original model and the
	DCASE 2020 Task 1A dataset.
	We would also appreciate an acknowledgment to autrainer.