4.7 Article

Touch Gesture and Emotion Recognition Using Decomposed Spatiotemporal Convolutions

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIM.2022.3147338

关键词

Robots; Emotion recognition; Three-dimensional displays; Sensor arrays; Pressure sensors; Spatiotemporal phenomena; Service robots; Decomposed spatiotemporal convolution; emotion recognition; human-robot tactile interaction; touch gesture recognition

资金

  1. China Postdoctoral Science Foundation [2021M692390]
  2. Tianjin Natural Science Foundation [20JCZDJC00150, 20JCY-BJC00320]

向作者/读者索取更多资源

Touch is an essential means of conveying emotions and intentions in human communication. This study focuses on the recognition of touch gestures and emotions by social robots, using a pressure sensor array to build a dataset. The proposed method utilizes a decomposed spatiotemporal convolution for feature representation, which improves the nonlinear expression ability of the model and reduces computation cost. Experimental results demonstrate the effectiveness of the proposed method and verify the feasibility of robot perceiving human emotions through touch.
Touch is one of the most essential and effective means to convey affective feelings and intentions in human communication. For a social robot, the ability to recognize human touch gestures and emotions could help realize efficient and natural human-robot interaction. To this end, an affective touch gesture dataset involving ten kinds of touch gestures and 12 kinds of discrete emotions was built by using a pressure sensor array, in which the acquired touch gesture samples are three-dimensional (3-D) spatiotemporal signals that include the shape appearance and motion dynamics. Due to the excellent performance of convolutional neural networks (CNNs), spatiotemporal CNNs have been effectively verified by researchers for 3-D signal classification. However, the large number of parameters and the high complexity of training 3-D convolution kernels remain to be solved. In this article, a decomposed spatiotemporal convolution was designed for feature representation from the raw touch gesture samples. Specifically, the 3-D kernel was factorized into three 1-D kernels by tensor decomposition. The proposed convolution has a simpler but deeper architecture than standard 3-D convolution, which improves the nonlinear expression ability of the model. Besides, the computation cost can be reduced without compromising recognition accuracy. Using a user-dependent test mode, the proposed method yields the accuracies of up to 92.41% and 72.47% for touch gesture and emotion recognitions, respectively. Experimental results demonstrate the effectiveness of the proposed method, and at the same time, preliminarily verify the feasibility of robot perceiving human emotions through touch.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据