☆ 4.6 Article

Speech synthesis from ECoG using densely connected 3D convolutional neural networks

JOURNAL OF NEURAL ENGINEERING (2019)

期刊

JOURNAL OF NEURAL ENGINEERING

卷 16, 期 3, 页码 -

出版社

IOP PUBLISHING LTD

DOI: 10.1088/1741-2552/ab0c59

关键词

speech synthesis; neural networks; Wavenet; electrocorticography; brain-computer interfaces; BCI

类别

Engineering, Biomedical Neurosciences

资金

BMBF [01GQ1602]
NSF, NSF/NIH/BMBF Collaborative Research in Computational Neuroscience Program [1608140]
Doris Duke Charitable Foundation (Clinical Scientist Development Award) [2011039]
NIH National Center for Advancing Translational Sciences [UL1TR000150, UL1TR001422]
NIH [F32DC015708, R01NS094748]
Div Of Information & Intelligent Systems
Direct For Computer & Info Scie & Enginr [1608140] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Objective. Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. Approach. Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. Main results. In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. Significance. To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.

Speech synthesis from ECoG using densely connected 3D convolutional neural networks

期刊

JOURNAL OF NEURAL ENGINEERING

出版社

IOP PUBLISHING LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Speech synthesis from ECoG using densely connected 3D convolutional neural networks

期刊

JOURNAL OF NEURAL ENGINEERING

出版社

IOP PUBLISHING LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文