4.7 Article

Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION
卷 126, 期 2-4, 页码 430-439

出版社

SPRINGER
DOI: 10.1007/s11263-016-0957-7

关键词

Gesture recognition; Deep neural networks

资金

  1. Agency for Innovation by Science and Technology in Flanders (IWT)

向作者/读者索取更多资源

Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that this method is not sufficient for gesture recognition, where temporal information is more discriminative compared to general video classification tasks. We explore deep architectures for gesture recognition in video and propose a new end-to-end trainable neural network architecture incorporating temporal convolutions and bidirectional recurrence. Our main contributions are twofold; first, we show that recurrence is crucial for this task; second, we show that adding temporal convolutions leads to significant improvements. We evaluate the different approaches on the Montalbano gesture recognition dataset, where we achieve state-of-the-art results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据