☆ 4.5 Article

Speech emotion recognition based on transfer learning from the FaceNet frameworka)

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA (2021)

期刊

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

卷 149, 期 2, 页码 1338-1345

出版社

ACOUSTICAL SOC AMER AMER INST PHYSICS

DOI: 10.1121/10.0003530

关键词

类别

Acoustics Audiology & Speech-Language Pathology

资金

Jilin Provincial Science and Technology Department [20180201003GX]
Jilin Province Development and Reform Commission [2019C053-4]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Speech plays a crucial role in human-computer emotional interaction, and this study utilizes the FaceNet model to improve speech emotion recognition. By pretraining on the CASIA dataset and fine-tuning on the IEMOCAP dataset, the proposed approach achieves high accuracy due to clean signals. Experimental results demonstrate that the method outperforms state-of-the-art approaches on the IEMOCAP dataset among single modal methods.

Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively. The cross-training is then conducted on the dataset, and comprehensive experiments are performed. Experimental results indicate that the proposed approach outperforms state-of-the-art methods on the IEMOCAP dataset among single modal approaches.

Speech emotion recognition based on transfer learning from the FaceNet frameworka)

期刊

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

出版社

ACOUSTICAL SOC AMER AMER INST PHYSICS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Speech emotion recognition based on transfer learning from the FaceNet frameworka)

期刊

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

出版社

ACOUSTICAL SOC AMER AMER INST PHYSICS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文