☆ 4.7 Article

A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2022)

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

卷 13, 期 3, 页码 1581-1592

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAFFC.2020.3021755

关键词

Depression; Feature extraction; Three-dimensional displays; Deep learning; Face recognition; Visualization; Convolution; Affective computing; depression detection; deep learning; 3D convolution neural network; face analysis; spatiotemporal expression recognition; multiscale processing

类别

Computer Science, Artificial Intelligence Computer Science, Cybernetics

资金

Academy of Finland
Natural Sciences and Engineering Research Council of Canada

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces a novel 3D CNN architecture (MSN) for effectively representing facial information related to depressive behaviors from videos. Experimental results show that the MSN architecture outperforms state-of-the-art methods in automatic depression recognition.

Recently, deep learning models have been successfully employed in many video-based affective computing applications (e.g., detecting pain, stress, and Alzheimer's disease). One key application is automatic depression recognition - recognition of facial expressions associated with depressive behaviour. State-of-the-art deep learning algorithms to recognize depression typically explore spatial and temporal information individually, by using 2D convolutional neural networks (CNNs) to analyze appearance information, and then by either mapping facial feature variations or averaging the depression level over video frames. This approach has limitations in terms of its ability to represent dynamic information that can help to accurately discriminate between depression levels. In contrast, models based on 3D CNNs allow to directly encode the spatio-temporal relationships, although these models rely on temporal information with fixed range and single receptive field. This approach limits the ability to capture variations of facial expression with diverse ranges, and the exploitation of diverse facial areas. In this article, a novel 3D CNN architecture - the Multiscale Spatiotemporal Network (MSN) - is introduced to effectively represent facial information related to depressive behaviours from videos. The basic structure of the model is composed of parallel convolutional layers with different temporal depths and sizes of receptive field, which allows the MSN to explore a wide range of spatio-temporal variations in facial expressions. Experimental results on two benchmark datasets show that our MSN architecture is effective, outperforming state-of-the-art methods in automatic depression recognition.

A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文