4.5 Article

Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes

Journal

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
Volume 129, Issue 1, Pages 388-403

Publisher

ACOUSTICAL SOC AMER AMER INST PHYSICS
DOI: 10.1121/1.3514525

Keywords

-

Funding

  1. Deutsche Forschungsgesellschaft within the collaborative research centre The active auditory system

Ask authors/readers for more resources

The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems. (C) 2011 Acoustical Society of America. [DOI: 10.1121/1.3514525]

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available