☆ 4.6 Article

Two-Level Attention Model Based Video Action Recognition Network

IEEE ACCESS (2019)

期刊

IEEE ACCESS

卷 7, 期 -, 页码 118388-118401

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2019.2936628

关键词

Action recognition; LSTM; recurrent region attention; video frame attention

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

资金

National Natural Science Foundation of China [61773105, 61374147]
Fundamental Research Funds for the Central Universities [N182008004]
Natural Science Foundation of Liaoning Province [20170540675]
Scienti~c Research Project of Liaoning Educational Department [LQGD2017023]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The complex environment background, lighting conditions, and other action-irrelevant visual information in the video frame bring a lot of redundancy and noise to the action spatial features, which seriously affects the accuracy of action recognition. Aiming at this point, we propose a recurrent region attention cell to capture the action-relevant regional visual information in the spatial feature, and according to the temporal sequential natures of the video, on the basis of the recurrent region attention cell, a Recurrent Region Attention model (RRA) is proposed. The recurrent region attention cell in the RRA iterates according to the temporal sequence of the video, so that the attention performance of the RRA is gradually improved. Secondly, we propose a Video Frame Attention model (VFA) that can highlight the more important frames in the whole action video sequence, so as to reduce the interference caused by the similarity between the heterogeneous action video sequences. Finally, we propose an end-to-end trainable network: Two-level Attention Model based video action recognition network (TAMNet). We experimented on two video action recognition benchmark datasets: UCF101 and HMDB51. Experiments show that our end-to-end TAMNet network can reliably focus on the more important video frames in the video sequence, and effectively capture the action-relevant regional visual information in the spatial features of each frame of the video sequence. Inspired by the two-stream structure, we construct a two-modalities TAMNet network. In the same training conditions, the two-modalities TAMNet network achieved optimal performance on both datasets.

Two-Level Attention Model Based Video Action Recognition Network

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Two-Level Attention Model Based Video Action Recognition Network

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文