4.7 Article

Fully Convolutional Network for Multiscale Temporal Action Proposals

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 20, 期 12, 页码 3428-3438

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2018.2839534

关键词

Temporal convolution; receptive field; multiple scale ranges; duration regression

资金

  1. Shanghai Science and Technology Committee [14511110100]

向作者/读者索取更多资源

Similar to the function of object proposals in localizing objects within images, temporal action proposals can facilitate the extraction of semantic segments and simplify the computations required for temporal action localization in untrimmed videos. In this paper, we propose a fully convolutional network to identify multistale temporal action proposals (FCN-TAP) that utilizes only the temporal convolutions to retrieve accurate action proposals for video sequences. Using gated linear units, our network enables simple but powerful inferences, and by parallelizing the computations, it significantly improves performances compared with previous recurrent models. To capture more temporal contexts with fewer parameters, we apply dilated convolutions to expand the receptive fields of our network. Moreover, we divide the receptive fields into multiple scale ranges and then refine the corresponding temporal boundaries using duration regression at each scale. To generate suitable segments with arbitrary durations for training, we design a new strategy to select sampled candidates within the corresponding scale range. The power of our method is demonstrated through experiments on the THUMOS'14 and ActivityNet datasets, where FCN-TAP performs better and achieves a remarkable speedup compared to other state-of-the-art methods. Additional experiments show that our method generates high-quality proposals and improves the localization stage of existing action detection pipelines.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据