☆ 4.5 Article

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA (2012)

期刊

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

卷 131, 期 5, 页码 4134-4151

出版社

ACOUSTICAL SOC AMER AMER INST PHYSICS

DOI: 10.1121/1.3699200

关键词

类别

Acoustics Audiology & Speech-Language Pathology

资金

DFG [SFB/TRR 31]
German Academic Exchange Service (DAAD)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically interpretable features. Robustness against extrinsic variation (different types of additive noise) and intrinsic variability (arising from changes in speaking rate, effort, and style) is quantified in a series of recognition experiments. The results are compared to reference ASR systems using Mel-frequency cepstral coefficients (MFCCs), MFCCs with cepstral mean subtraction (CMS) and RASTA-PLP features, respectively. Gabor features are shown to be more robust against extrinsic variation than the baseline systems without CMS, with relative improvements of 28% and 16% for two training conditions (using only clean training samples or a mixture of noisy and clean utterances, respectively). When used in a state-of-the-art system, improvements of 14% are observed when spectro-temporal features are concatenated with MFCCs, indicating the complementarity of those feature types. An analysis of the importance of specific MF shows that temporal MF up to 25 Hz and spectral MF up to 0.25 cycles/channel are beneficial for ASR. (C) 2012 Acoustical Society of America. [http://dx.doi.org/10.1121/1.3699200]

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition

期刊

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

出版社

ACOUSTICAL SOC AMER AMER INST PHYSICS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition

期刊

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

出版社

ACOUSTICAL SOC AMER AMER INST PHYSICS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文