☆ 4.7 Article

Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

APPLIED ACOUSTICS (2020)

期刊

APPLIED ACOUSTICS

卷 166, 期 -, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.apacoust.2020.107360

关键词

Speech emotion recognition; Quantum-behaved particle swarm optimization; Gaussian elliptical basis function

类别

Acoustics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, a hybrid system consisting of three stages of feature extraction, dimensionality reduction, and feature classification is proposed for speech emotion recognition (SER). At feature extraction stage, an informationally-rich spectral-prosodic hybrid feature vector comprised of perceptual-spectral features; that is, mel-frequency cepstral coefficient (MFCC), perceptual linear prediction coefficient (PLPC), and perceptual minimum variance distortionless response (PMVDR) coefficient along with the prosodic feature of pitch (i.e. F0) are extracted for each frame. This feature vector is extracted from both speech signal and its glottal-waveform. The first and the second-order derivatives are then added to the above-mentioned vector to form a high-dimensional hybrid feature vector characterized by a large number of dimensions. At the next stage, i.e. dimensionality reduction, the dimensionality of this feature vector is reduced using a new proposed quantum-behaved particle swarm optimization (QPSO)-based approach. In this paper, a new QPSO algorithm (so-called, pQPSO) is presented that makes use of a truncated Laplace distribution (TLD) to generate new particles and thus to produce solutions (i.e. particles) that are all within a valid range of a problem (contrary to the standard QPSO). The contraction-expansion (CE) factor of the proposed pQPSO is also selected adaptively. Using the proposed QPSO algorithm, an optimal discriminative dimensionality reduction matrix (i.e. projection matrix) is estimated with emotion classification accuracy as a class-discriminative criterion. At the subsequent stage, vectors with reduced feature dimensionality are fed into a Gaussian elliptical basis function (GEBF)-type neural network classifier to detect their speech emotion. To accelerate the training phase of the GEBF classifier, a fast-scaled conjugate gradient (SCG) algorithm is correspondingly employed that does not need to adjust the learning rate. Finally, the proposed method is evaluated on three standard emotional speech databases of Berlin Database of Emotional Speech (EMODB), Surrey Audio-Visual Expressed Emotion (SAVEE), and Interactive Emotional Dyadic Motion Capture (IEMOCAP). The experimental results showed that the proposed method was more accurate than state-of-the-art ones in terms of detecting speech emotions. (C) 2020 Elsevier Ltd. All rights reserved.

Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

期刊

APPLIED ACOUSTICS

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

期刊

APPLIED ACOUSTICS

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文