4.5 Article

Combining visual and acoustic features for audio classification tasks

Journal

PATTERN RECOGNITION LETTERS
Volume 88, Issue -, Pages 49-56

Publisher

ELSEVIER
DOI: 10.1016/j.patrec.2017.01.013

Keywords

Audio classification; Texture; Image processing; Acoustic features; Ensemble of classifiers; Pattern recognition

Ask authors/readers for more resources

In this paper a novel and effective approach for automated audio classification is presented that is based on the fusion of different sets of features, both visual and acoustic. A number of different acoustic and visual features of sounds are evaluated and compared. These features are then fused in an ensemble that produces better classification accuracy than other state-of-the-art approaches. The visual features of sounds are built starting from the audio file and are taken from images constructed from different spectrograms, a gammatonegram, and a rhythm image. These images are divided into sub windows from which a set of texture descriptors are extracted. For each feature descriptor a different Support Vector Machine (SVM) is trained. The SVMs outputs are summed for a final decision. The proposed ensemble is evaluated on three well-known databases of music genre classification (the Latin Music Database, the ISMIR 2004 database, and the GTZAN genre collection), a dataset of Bird vocalization aiming specie recognition, and a dataset of right whale calls aiming whale detection. The MAT LAB code for the ensemble of classifiers and for the extraction of the features will be publicly available (https://www.deLunipclit/node/2357 +Pattern Recognition and Ensemble Classifiers). (C) 2017 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available