4.5 Article

Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes

期刊

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
卷 129, 期 1, 页码 388-403

出版社

ACOUSTICAL SOC AMER AMER INST PHYSICS
DOI: 10.1121/1.3514525

关键词

-

资金

  1. Deutsche Forschungsgesellschaft within the collaborative research centre The active auditory system

向作者/读者索取更多资源

The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems. (C) 2011 Acoustical Society of America. [DOI: 10.1121/1.3514525]

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据