☆ 4.7 Article

MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach

EXPERT SYSTEMS WITH APPLICATIONS (2021)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 167, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2020.114177

关键词

Affective computing; Dilated convolutional neural network; Real-time speech emotion recognition; Parallel learning; Multi-learning trick (MLT); And raw audio clips

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

资金

National Research Foundation of Korea - Korean Government through the Ministry of Science and ICT [NRF-2020R1F1A1060659]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The paper discusses an end-to-end real-time SER model based on DCNN, which extracts emotional features and learns long-term contextual dependencies through multi-learning strategy, achieving high recognition accuracy.

Speech is the most dominant source of communication among humans, and it is an efficient way for human-computer interaction (HCI) to exchange information. Nowadays, speech emotion recognition (SER) is an active research area that plays a crucial role in real-time applications. In this era, the SER system has lacked real-time speech processing. To address this problem, we propose an end-to-end real-time SER model that is based on a one-dimensional dilated convolutional neural network (DCNN). Our model used a multi-learning strategy to parallel extract spatial salient emotional features and learn long term contextual dependencies from the speech signals. We used residual blocks with a skip connection (RBSC) module-, in order to find a correlation, the emotional cues, and the sequence learning (Seq_L) module, to learn the long term contextual dependencies in the input features. Furthermore, we used a fusion layer to concatenate these learned features for the final emotion recognition task. Our model structure is quite simple, and it is capable of automatically learning salient discriminative features from the speech signals. We evaluated our model using benchmark IEMOCAP and EMO-DB datasets and obtained a high recognition accuracy, which were 73% and 90%, respectively. The experimental results indicated the significance and the efficiency of our proposed model have shown excessive assistance with the implementation of a real-time SER system. Hence, our model is capable of processing original speech signals for the emotion recognition that utilizes lightweight dilated CNN architecture that implements the multi-learning trick (MLT) approach.

MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文