File size: 3,228 Bytes

88210aa
 
ee584b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec6a65a
ee584b0
 
ec6a65a
ee584b0
 
ec6a65a
ee584b0
 
ec6a65a
88210aa
 
 
ee584b0
88210aa
 
 
ee584b0
88210aa
 
 
 
 
 
 
ee584b0
 
88210aa
 
 
 
 
6b489ae
 
 
 
 
 
 
88210aa
 
 
ee584b0
88210aa
 
 
ee584b0
88210aa
 
 
ee584b0
ec6a65a
88210aa
 
ee584b0
822e1b5
88210aa
 
ee584b0
ec6a65a
88210aa
 
ee584b0

---
license: cc-by-4.0
metrics:
  - accuracy
  - f1-micro
  - f1-macro
  - f1-weighted
pipeline_tag: audio-classification
tags:
  - audio
  - audio-classification
  - ecoacoustic-tagging
  - autrainer
library_name: autrainer
model-index:
  - name: edansa-2019-cnn10-32k-t
    results:
      - task:
          type: audio-classification
          name: Ecoacoustic Tagging
        metrics:
          - type: accuracy
            name: Accuracy
            value: 0.6968486462494452
          - type: f1-micro
            name: Micro F1
            value: 0.8765212229148116
          - type: f1-macro
            name: Macro F1
            value: 0.8614431334513389
          - type: f1-weighted
            name: Weighted F1
            value: 0.8706722471821455
---

# ABGS Ecoacoustic Tagging Model

Model that tags audio files as belonging to one or more of the following labels: anthropophony (A), biophony (B), geophony (G), or silence (S).

## Installation

To use the model, you have to install autrainer, e.g. via pip:

```bash
pip install autrainer
```

## Usage

The model can be applied on all wav files present in a folder (`<data-root>`) and stored in another folder (`<output-root>`):

```python
autrainer inference hf:autrainer/edansa-2019-cnn10-32k-t <data-root> <output-root>
```

In order to obtain the predictions based on the class-specific count thresholds, we recommend using a window size of 10s and a hop size of 1s (`-w 10 -s 1`).
Then apply the postprocess_predictions.py script to obtain the final predictions by specifying the csv path (`--path`) to the results.csv:

```
python postprocess_predictions.py --path /path/to/results.csv
```

## Training

### Pretraining

The model has been originally trained on AudioSet by Kong et. al..

### Dataset

The model has been further trained (finetuned) on the training set of the EDANSA2019 dataset. The dataset was collected in the North Slope of Alaskan at latitudes between 64◦ and 70◦ N, and longitudes between 139◦ to 150◦ W from a total of 40 devices, each placed in a different location, separated by ca. 20kM from other locations. A subset of the entire dataset has been annotated for 28 labels (tags), of which only the 4 highest level categories were used: anthropophony, biophony, geophony, and silence. The sampling rate was 48kHz.

### Features

The EDANSA2019 dataset was resampled to 32kHz, as this was the sampling rate of AudioSet, where the model was originally trained on. Log-Mel spectrograms were then extracted using torchlibrosa using the parameters that the upstream model was trained on.

### Training process

The model has been trained for 30 epochs. At the end of each epoch, the model was evaluated on the official validation set. We release the state that achieved the best performance on this validation set. All training hyperparameters can be found inside `conf/config.yaml` inside the model folder.

### Evaluation

The model has only been evaluated on in-domain data. The performance on the official test set reached a 0.87 (weighted) f1-score.

## Acknowledgments

Please acknowledge the work which produced the original model and the EDANSA2019 dataset. We would also appreciate an acknowledgment to autrainer.