☆ 4.5 Article

Deep Elman recurrent neural networks for statistical parametric speech synthesis

SPEECH COMMUNICATION (2017)

期刊

SPEECH COMMUNICATION

卷 93, 期 -, 页码 31-42

出版社

ELSEVIER

DOI: 10.1016/j.specom.2017.08.003

关键词

Speech synthesis; Recurrent neural networks; Deep neural networks; Hidden state

类别

Acoustics Computer Science, Interdisciplinary Applications

资金

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Owing to the success of deep learning techniques in automatic speech recognition, deep neural networks (DNNs) have been used as acoustic models for statistical parametric speech synthesis (SPSS). DNNs do not inherently model the temporal structure in speech and text, and hence are not well suited to be directly applied to the problem of SPSS. Recurrent neural networks (RNN) on the other hand have the capability to model time-series. RNNs with long short-term memory (LSTM) cells have been shown to outperform DNN based SPSS. However, LSTM cells and its variants like gated recurrent units (GRU), simplified LSTMs (SLSTM) have complicated structure and are computationally expensive compared to the simple recurrent architecture like Elman RNN. In this paper, we explore deep Elman RNNs for SPSS and compare their effectiveness against deep gated RNNs. Specifically, we perform experiments to show that (1) Deep Elman RNNs are better suited for acoustic modeling in SPSS when compared to DNNs and perform competitively to deep SLSTMs, GRUs and LSTMs, (2) Context representation learning using Elman RNNs improves neural network acoustic models for SPSS, and (3) Elman RNN based duration model is better than the DNN based counterpart. Experiments were performed on Blizzard Challenge 2015 dataset consisting of 3 Indian languages (Telugu, Hindi and Tamil). Through subjective and objective evaluations, we show that our proposed systems outperform the baseline systems across different speakers and languages. (C) 2017 Elsevier B.V. All rights reserved.

Deep Elman recurrent neural networks for statistical parametric speech synthesis

期刊

SPEECH COMMUNICATION

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Deep Elman recurrent neural networks for statistical parametric speech synthesis

期刊

SPEECH COMMUNICATION

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文