☆ 3.8 Proceedings Paper

Deep-learning-based central African primate species classification with MixUp and SpecAugment

INTERSPEECH 2021 (2021)

Journal

INTERSPEECH 2021

Volume -, Issue -, Pages 456-460

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

DOI: 10.21437/Interspeech.2021-1911

Keywords

automated species classification; primate vocalizations; audio data augmentation

Funding

French ANR agency within the LUDAU project [ANR-18-CE23-0005-01]
French Investing for the Future -PIA3 AI Interdisciplinary Institute ANITI [ANR-19-PI3A-0004]
CALMIP [2020-p20022]
Agence Nationale de la Recherche (ANR) [ANR-18-CE23-0005] Funding Source: Agence Nationale de la Recherche (ANR)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper reports experiments on automatic classification of primate vocalizations using various deep neural network architectures, data augmentation techniques, and a balanced data sampler, resulting in improved performance. The best model achieved high accuracy on both the development and test sets, surpassing the official baseline. Additional fusion and classification experiments based on embeddings did not yield better results.

In this paper, we report experiments in which we aim to automatically classify primate vocalizations according to four primate species of interest, plus a background category with forest sound events. We compare several standard deep neural networks architectures: standard deep convolutional neural networks (CNNs), MobileNets and ResNets. To tackle the small size of the training dataset, less than seven thousand audio files, the data augmentation techniques SpecAugment and MixUp proved to be very useful. Against the very unbalanced classes of the dataset, we used a balanced data sampler that showed to be efficient. An exponential moving average of the model weights allowed to get slight further gains. The best model was a standard 10-layer CNN, comprised of about five million parameters. It achieved a 93.6% Unweighted Average Recall (UAR) on the development set, and generalized well on the test set with a 92.5% UAR, outperforming an official baseline of 86.6%. We quantify the performance gains brought by the augmentations and training tricks, and report fusion and classification experiments based on embeddings that did not bring better results.

Deep-learning-based central African primate species classification with MixUp and SpecAugment

Journal

INTERSPEECH 2021

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Deep-learning-based central African primate species classification with MixUp and SpecAugment

Journal

INTERSPEECH 2021

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper