☆ 4.5 Article

Exploiting correlogram structure for robust speech recognition with multiple speech sources

SPEECH COMMUNICATION (2007)

Journal

SPEECH COMMUNICATION

Volume 49, Issue 12, Pages 874-891

Publisher

ELSEVIER

DOI: 10.1016/j.specom.2007.05.003

Keywords

speech separation; robust speech recognition; multiple pitch tracking; computational auditory scene analysis; correlogram; speech fragment decoding

Funding

Engineering and Physical Sciences Research Council [GR/T04823/01] Funding Source: researchfish

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlograrn domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together with the spectral representation, are employed by a 'speech fragment decoder' which employs 'missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy. (C) 2007 Elsevier B.V. All rights reserved.

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Journal

SPEECH COMMUNICATION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Journal

SPEECH COMMUNICATION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper