4.7 Article

A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
卷 13, 期 3, 页码 1581-1592

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAFFC.2020.3021755

关键词

Depression; Feature extraction; Three-dimensional displays; Deep learning; Face recognition; Visualization; Convolution; Affective computing; depression detection; deep learning; 3D convolution neural network; face analysis; spatiotemporal expression recognition; multiscale processing

资金

  1. Academy of Finland
  2. Natural Sciences and Engineering Research Council of Canada

向作者/读者索取更多资源

This article introduces a novel 3D CNN architecture (MSN) for effectively representing facial information related to depressive behaviors from videos. Experimental results show that the MSN architecture outperforms state-of-the-art methods in automatic depression recognition.
Recently, deep learning models have been successfully employed in many video-based affective computing applications (e.g., detecting pain, stress, and Alzheimer's disease). One key application is automatic depression recognition - recognition of facial expressions associated with depressive behaviour. State-of-the-art deep learning algorithms to recognize depression typically explore spatial and temporal information individually, by using 2D convolutional neural networks (CNNs) to analyze appearance information, and then by either mapping facial feature variations or averaging the depression level over video frames. This approach has limitations in terms of its ability to represent dynamic information that can help to accurately discriminate between depression levels. In contrast, models based on 3D CNNs allow to directly encode the spatio-temporal relationships, although these models rely on temporal information with fixed range and single receptive field. This approach limits the ability to capture variations of facial expression with diverse ranges, and the exploitation of diverse facial areas. In this article, a novel 3D CNN architecture - the Multiscale Spatiotemporal Network (MSN) - is introduced to effectively represent facial information related to depressive behaviours from videos. The basic structure of the model is composed of parallel convolutional layers with different temporal depths and sizes of receptive field, which allows the MSN to explore a wide range of spatio-temporal variations in facial expressions. Experimental results on two benchmark datasets show that our MSN architecture is effective, outperforming state-of-the-art methods in automatic depression recognition.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据