☆ 4.5 Article

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

SPEECH COMMUNICATION (2017)

期刊

SPEECH COMMUNICATION

卷 89, 期 -, 页码 103-112

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/j.specom.2017.03.003

关键词

Automatic speech recognition; Articulatory trajectories; Vocal tract variables; Hybrid convolutional neural networks; Time-frequency convolution; Convolutional neural networks

类别

Acoustics Computer Science, Interdisciplinary Applications

资金

NSF [IIS-0964556, IIS-1162046, IIS-1161962]
Direct For Computer & Info Scie & Enginr
Div Of Information & Intelligent Systems [1162033] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Studies have shown that articulatory information helps model speech variability and, consequently, improves speech recognition performance. But learning speaker-invariant articulatory models is challenging, as speaker-specific signatures in both the articulatory and acoustic space increase complexity of speech-to-articulatory mapping, which is already an ill-posed problem due to its inherent nonlinearity and non unique nature. This work explores using deep neural networks (DNNs) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space. Our speech-inversion results indicate that the CNN models perform better than their DNN counterparts. In addition, we use these inverse-models to generate articulatory information from speech for two separate speech recognition tasks: the WSJ1 and Aurora-4 continuous speech recognition tasks. This work proposes a hybrid convolutional neural network (HCNN), where two parallel layers are used to jointly model the acoustic and articulatory spaces, and the decisions from the parallel layers are fused at the output context-dependent (CD) state level. The acoustic model performs time-frequency convolution on filterbank-energy-level features, whereas the articulatory model performs time convolution on the articulatory features. The performance of the proposed architecture is compared to that of the CNN- and DNN-based systems using gammatone filterbank energies as acoustic features, and the results indicate that the HCNN-based model demonstrates lower word error rates compared to the CNN/DNN baseline systems. (C) 2017 Elsevier B.V. All rights reserved.

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

期刊

SPEECH COMMUNICATION

出版社

ELSEVIER SCIENCE BV

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

期刊

SPEECH COMMUNICATION

出版社

ELSEVIER SCIENCE BV

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文