☆ 4.2 Article

Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2006)

Journal

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Volume 14, Issue 3, Pages 920-930

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TSA.2005.858055

Keywords

audio classification and segmentation; auditory model; speech discrimination

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.

Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations

Journal

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations

Journal

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper