☆ 4.7 Article

Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

IEEE TRANSACTIONS ON MULTIMEDIA (2014)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 16, 期 8, 页码 2203-2213

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2014.2360798

关键词

Affective-salient discriminative feature analysis; convolutional neural networks; feature learning; speech emotion recognition

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

National Nature Science Foundation of China [61272211, 61170126]
Six Talent Peaks Foundation of Jiangsu Province [DZXX-027]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

As an essential way of human emotional behavior understanding, speech emotion recognition (SER) has attracted a great deal of attention in human-centered signal processing. Accuracy in SER heavily depends on finding good affect-related, discriminative features. In this paper, we propose to learn affect-salient features for SER using convolutional neural networks (CNN). The training of CNN involves two stages. In the first stage, unlabeled samples are used to learn local invariant features (LIF) using a variant of sparse auto-encoder (SAE) with reconstruction penalization. In the second step, LIF is used as the input to a feature extractor, salient discriminative feature analysis (SDFA), to learn affect-salient, discriminative features using a novel objective function that encourages feature saliency, orthogonality, and discrimination for SER. Our experimental results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e. g., with speaker and language variation, and environment distortion) and outperforms several well-established SER features.

Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文