☆ 4.6 Article

Multi-feature deep supervised voiceprint adversarial network for depression recognition from speech

BIOMEDICAL SIGNAL PROCESSING AND CONTROL (2024)

期刊

BIOMEDICAL SIGNAL PROCESSING AND CONTROL

卷 89, 期 -, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.bspc.2023.105704

关键词

Adversarial learning; Audio processing; Attention mechanism; Deep neural network; Depression recognition; Feature enhancement

类别

Engineering, Biomedical

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, the authors propose the MFDS-VAN, a deep supervised voiceprint adversarial network, for audio-based depression recognition. The MFDS-VAN integrates acoustic features and audio waveform to predict depression score. Experimental results show that the MFDS-VAN significantly enhances robustness and performance in speech-based depression recognition, achieving competitive results compared to recent audio-based methodologies.

Depression can induce a range of physiological effects, leading to notable distinctions in the acoustic charac-teristics exhibited by individuals with depression as opposed to those without. Designing efficient algorithms to accurately identify depression through speech poses a formidable challenge. In this paper, we propose the Multi-Feature Deep Supervised Voiceprint Adversarial Network (MFDS-VAN) for audio-based depression recognition. The MFDS-VAN assimilates extracted acoustic features and the audio waveform, subsequently generating predictions regarding the depression score. In order to attain more robust and discriminative spatial- temporal features associated with depression, the Encoding Network module merges long-term and short-term acoustic features with the unprocessed audio waveform, while the Regression Network module enables prediction of the depression score. The Deep Supervised Regression algorithm is designed by combining GE2E clustering and Huber regression for better network optimization. Furthermore, to enhance the representation the MFDS-VAN while diminishing the influence of individual voiceprint information, we propose the Voiceprint Adversarial Network. Experimental results conducted on AVEC 2013, AVEC 2014, and AVEC 2017 datasets demonstrate that the MFDS-VAN significantly enhances robustness and performance in speech-based depression recognition. Our model achieves competitive results when compared to recent audio-based methodologies.

Multi-feature deep supervised voiceprint adversarial network for depression recognition from speech

期刊

BIOMEDICAL SIGNAL PROCESSING AND CONTROL

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Multi-feature deep supervised voiceprint adversarial network for depression recognition from speech

期刊

BIOMEDICAL SIGNAL PROCESSING AND CONTROL

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文