☆ 4.7 Article

Online action proposal generation using spatio-temporal attention network

NEURAL NETWORKS (2022)

Journal

NEURAL NETWORKS

Volume 153, Issue -, Pages 518-529

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.neunet.2022.06.032

Keywords

Temporal action proposal; Action detection; Spatial attention; Temporal attention

Funding

Institute of Information & communications Technology Planning & Evaluation (IITP) , South Korea - Korea government (MSIT) [2021-0-02068]
Artificial Intelligence Innovation Hub
National Research Foundation of Korea (NRF) - Ko-rea government (MSIT) [NRF-2021R1A2C3011169]
MSIT (Ministry of Science and ICT) , Korea, under the ITRC (Information Technology Research Center) , South Korea support program [IITP-2022-2020-0-01808]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study proposes a novel spatio-temporal attention network for online action proposal generation, which can generate precise action boundaries and handle noisy features effectively, suitable for online tasks.

Temporal action proposal generation aims to generate temporal boundaries containing action instances. In real-time applications such as surveillance cameras, autonomous driving, and traffic monitoring, the online localization and recognition of human activities occurring in short temporal intervals are important areas of research. Existing approaches of temporal action proposal generation consider only the offline and frame-level feature aggregation along the temporal dimension. Those offline methods also generate many redundant irrelevant proposal regions in the frames as temporal boundaries. This leads to higher computational cost along with slow processing speed which is not suitable for online tasks. In this study, we propose a novel spatio-temporal attention network for online action proposal generation as opposed to existing offline proposal generation methods. Our novel proposed approach incorporates the inter-dependency between the spatial and temporal context information of each incoming video clip to generate more relevant online temporal action proposals. First, we propose a windowed spatial attention module to capture the inter-spatial relationship between the features of incoming frames. The windowed spatial network produces more robust clip-level feature representation and efficiently deals with noisy features such as occlusion or background scenes. Second, we introduce a temporal attention module to capture relevant temporal dynamic information mutually to the localized spatial information to model the long inter-frame temporal relationship since most online real life videos are untrimmed in nature. By applying these two attention modules sequentially, the novel proposed spatio-temporal network model is able to generate precise action boundaries at a particular instant of time. In addition, the model generates fewer discriminative temporal action proposals while maintaining a low computational cost and high processing speed suitable for online settings.

Online action proposal generation using spatio-temporal attention network

Journal

NEURAL NETWORKS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Online action proposal generation using spatio-temporal attention network

Journal

NEURAL NETWORKS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper