--- license: cc-by-4.0 metrics: - accuracy - f1 - uar pipeline_tag: audio-classification tags: - audio - audio-classification - acoustic-scene-classification - autrainer library_name: autrainer model-index: - name: dcase-2020-t1a-cnn14-32k-t results: - task: type: audio-classification name: Acoustic Scene Classification metrics: - type: accuracy name: Accuracy value: 0.6778975741239892 - type: f1 name: F1 value: 0.6749168062342605 - type: uar name: Unweighted Average Recall value: 0.6778357903357903 --- # Acoustic Scene Classification Model `CNN14` model from the [PANN](https://zenodo.org/records/3987831) family that classifies audio files into one of the following 10 different acoustic scenes: _airport_, _bus_, _metro_, _metro_station_, _park_, _public_square_, _shopping_mall_, _street_pedestrian_, _street_traffic_, and _tram_. ## Installation To use the model, you have to install autrainer, e.g. via pip: ```bash pip install autrainer ``` ## Usage The model can be applied on all audio files present in a folder (``) and stores the predictions in another folder (``): ```bash autrainer inference hf:autrainer/dcase2020-t1a-cnn14-32k-t ``` ## Training ### Pretraining The model has been originally trained on AudioSet by [Kong et. al.](https://zenodo.org/records/3987831). ### Dataset The model has been further trained (finetuned) on the training set of the [DCASE 2020 Task 1A](http://dcase.community/challenge2020/task-acoustic-scene-classification) dataset. The dataset comprises 10 different acoustic scenes recorded in 12 European cities with real and simulated devices. The audio recordings were provided as 10-second segments with a sample rate of 48 kHz. ### Features The DCASE 2020 Task 1A dataset was resampled to 32 kHz, as this was the sampling rate of AudioSet, which the model was pretrained on. Then, log-Mel spectrograms were extracted with torchlibrosa using the parameters that the upstream model was trained on. ### Training Process The model has been trained for 50 epochs. At the end of each epoch, the model was evaluated on the validation set. We release the state that achieved the best performance on this validation set. All training hyperparameters can be found in the main configuration file (`conf/config.yaml`). ### Evaluation No public test set is provided for the DCASE 2020 Task 1A dataset. Therefore, we evaluate the model on the validation set. The model achieves a classification accuracy of 0.67 on the validation set. ### Acknowledgements Please acknowledge the work which produced the original model and the DCASE 2020 Task 1A dataset. We would also appreciate an acknowledgment to autrainer.