☆ 4.5 Article

Combining visual and acoustic features for audio classification tasks

PATTERN RECOGNITION LETTERS (2017)

Journal

PATTERN RECOGNITION LETTERS

Volume 88, Issue -, Pages 49-56

Publisher

ELSEVIER

DOI: 10.1016/j.patrec.2017.01.013

Keywords

Audio classification; Texture; Image processing; Acoustic features; Ensemble of classifiers; Pattern recognition

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In this paper a novel and effective approach for automated audio classification is presented that is based on the fusion of different sets of features, both visual and acoustic. A number of different acoustic and visual features of sounds are evaluated and compared. These features are then fused in an ensemble that produces better classification accuracy than other state-of-the-art approaches. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms, a gammatonegram, and a rhythm image. These images are divided into sub windows from which a set of texture descriptors are extracted. For each feature descriptor a different Support Vector Machine (SVM) is trained. The SVMs outputs are summed for a final decision. The proposed ensemble is evaluated on three well-known databases of music genre classification (the Latin Music Database, the ISMIR 2004 database, and the GTZAN genre collection), a dataset of Bird vocalization aiming specie recognition, and a dataset of right whale calls aiming whale detection. The MAT LAB code for the ensemble of classifiers and for the extraction of the features will be publicly available (https://www.deLunipclit/node/2357 +Pattern Recognition and Ensemble Classifiers). (C) 2017 Elsevier B.V. All rights reserved.

Combining visual and acoustic features for audio classification tasks

Journal

PATTERN RECOGNITION LETTERS

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Combining visual and acoustic features for audio classification tasks

Journal

PATTERN RECOGNITION LETTERS

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper