4.6 Article

DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition

期刊

NEUROCOMPUTING
卷 444, 期 -, 页码 319-331

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2020.05.118

关键词

Human action recognition; Deep learning; Convolutional neural network; Long-range temporal; LSTM

资金

  1. National Key Research and Development Project
  2. Sichuan Science and Technology Program [2020YJ0207]
  3. Foundation for Department of Transportation of Henan Province [2019J22]
  4. Fundamental Research Funds for the Central Universities [A09205020520013]
  5. National Natural Science Foundation of China [61772436]

向作者/读者索取更多资源

A novel deep learning model is proposed in this paper to capture spatial and temporal patterns of human actions from videos, utilizing methods such as sample representation learner, Densely-connected Bi-directional LSTM network, and fusion of appearance and motion modalities. These techniques improve the effectiveness and robustness of long-range action recognition, leading to promising performance surpassing existing approaches in benchmark datasets UCF101 and HMDB51.
Although deep learning has achieved promising progress recently, action recognition remains a challeng-ing task, due to cluttered backgrounds, diverse scenes, occlusions, viewpoint variations and camera motions. In this paper, we propose a novel deep learning model to capture the spatial and temporal pat-terns of human actions from videos. Sample representation learner is proposed to extract the video-level temporal feature, which combines the sparse temporal sampling and long-range temporal learning to form an efficient and effective training strategy. To boost the effectiveness and robustness of modeling long-range action recognition, a Densely-connected Bi-directional LSTM (DB-LSTM) network is novelly proposed to model the visual and temporal associations in both forward and backward directions. They are stacked and integrated with the dense skip-connections to improve the capability of temporal pattern modeling. Two modalities from appearance and motion are integrated with a fusion module to further improve the performance. Experiments conducted on two benchmark datasets, UCF101 and HMDB51, demonstrate that the proposed DB-LSTM model achieves promising performance, which out-performs the state-of-the-art approaches for action recognition. (c) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据