☆ 4.2 Article

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2012)

期刊

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

卷 20, 期 1, 页码 30-42

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASL.2011.2134090

关键词

Artificial neural network-hidden Markov model (ANN-HMM); context-dependent phone; deep belief network; deep neural network hidden Markov model (DNN-HMM); speech recognition; large-vocabulary speech recognition (LVSR)

类别

Acoustics Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

期刊

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

期刊

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文