☆ 4.6 Article

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

PEERJ (2014)

Journal

PEERJ

Volume 2, Issue -, Pages -

Publisher

PEERJ INC

DOI: 10.7717/peerj.488

Keywords

Bioacoustics; Machine learning; Birds; Classification; Vocalisation; Birdsong

Funding

EPSRC Leadership Fellowship [EP/G007144/1]
EPSRC Early Career Fellowship [EP/L020505/1]
EPSRC [EP/G007144/1, EP/L020505/1] Funding Source: UKRI
Engineering and Physical Sciences Research Council [EP/L020505/1, EP/G007144/1] Funding Source: researchfish

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and unsupervised, meaning it requires no manual data labelling, yet it can improve performance on supervised tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We study the interaction between dataset characteristics and choice of feature representation through further empirical analysis.

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

Journal

PEERJ

Publisher

PEERJ INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

Journal

PEERJ

Publisher

PEERJ INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper