3.8 Proceedings Paper

Deep-learning-based central African primate species classification with MixUp and SpecAugment

Journal

INTERSPEECH 2021
Volume -, Issue -, Pages 456-460

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC
DOI: 10.21437/Interspeech.2021-1911

Keywords

automated species classification; primate vocalizations; audio data augmentation

Funding

  1. French ANR agency within the LUDAU project [ANR-18-CE23-0005-01]
  2. French Investing for the Future -PIA3 AI Interdisciplinary Institute ANITI [ANR-19-PI3A-0004]
  3. CALMIP [2020-p20022]
  4. Agence Nationale de la Recherche (ANR) [ANR-18-CE23-0005] Funding Source: Agence Nationale de la Recherche (ANR)

Ask authors/readers for more resources

This paper reports experiments on automatic classification of primate vocalizations using various deep neural network architectures, data augmentation techniques, and a balanced data sampler, resulting in improved performance. The best model achieved high accuracy on both the development and test sets, surpassing the official baseline. Additional fusion and classification experiments based on embeddings did not yield better results.
In this paper, we report experiments in which we aim to automatically classify primate vocalizations according to four primate species of interest, plus a background category with forest sound events. We compare several standard deep neural networks architectures: standard deep convolutional neural networks (CNNs), MobileNets and ResNets. To tackle the small size of the training dataset, less than seven thousand audio files, the data augmentation techniques SpecAugment and MixUp proved to be very useful. Against the very unbalanced classes of the dataset, we used a balanced data sampler that showed to be efficient. An exponential moving average of the model weights allowed to get slight further gains. The best model was a standard 10-layer CNN, comprised of about five million parameters. It achieved a 93.6% Unweighted Average Recall (UAR) on the development set, and generalized well on the test set with a 92.5% UAR, outperforming an official baseline of 86.6%. We quantify the performance gains brought by the augmentations and training tricks, and report fusion and classification experiments based on embeddings that did not bring better results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available