☆ 4.6 Article

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition

NEUROCOMPUTING (2016)

期刊

NEUROCOMPUTING

卷 218, 期 -, 页码 448-459

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2016.09.018

关键词

Transfer learning; Speaker adaptation; Deep neural network; Multi-task learning

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we present a unified approach to transfer learning of deep neural networks (DNNs) to address performance degradation issues caused by a potential acoustic mismatch between the training and testing conditions due to inter-speaker variability in state-of-the-art connectionist (a.k.a., hybrid) automatic speech recognition (ASR) systems. Different schemes to transfer knowledge of deep neural networks related to speaker adaptation can be developed with ease under such a unifying concept as demonstrated in the three frameworks investigated in this study. In the first solution, knowledge is transferred between homogeneous domains, namely the source and the target domains. Moreover the transfer takes place in a sequential manner from the target to the source speaker to boost the ASR accuracy on spoken utterances from a surprise target speaker. In the second solution, a multi-task approach is adopted to adjust the connectionist parameters to improve the ASR system performance on the target speaker. Knowledge is transferred simultaneously among heterogeneous tasks, and that is achieved by adding one or more smaller auxiliary output layers to the original DNN structure. In the third solution, DNN output classes are organised into a hierarchical structure in order to adjust the connectionist parameters and close the gap between training and testing conditions by transferring prior knowledge from the root node to the leaves in a structural maximum a posteriori fashion. Through a series of experiments on the Wall Street Journal (WSJ) speech recognition task, we show that the proposed solutions result in consistent and statistically significant word error rate reductions. Most importantly, we show that transfer learning is an enabling technology for speaker adaptation, since it outperforms both the transformation-based adaptation algorithms usually adapted in the speech community, and the multi-condition training (MCT) schemes, which is a data combination methods often adopted to cover more acoustic variabilities in speech when data from the source and target domains are both available at the training time. Finally, experimental evidence demonstrates that all proposed solutions are robust to negative transfer even when only a single sentence from the target speaker is available. (C) 2016 Elsevier B.V. All rights reserved.

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文