4.7 Article

Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2021.3070179

关键词

Visualization; Feature extraction; Encoding; Skeleton; Task analysis; Image recognition; Image coding; Cross-view 3-D action recognition; discriminative viewpoint instance discovery; Fisher vector (FV); multi-view dynamic image (MVDI); viewpoint aggregation

资金

  1. National Natural Science Foundation of China [61502187, 61876211]
  2. Natural Science Foundation of Hunan Province [2018JJ2052]
  3. Equipment Pre-Research Field Fund of China [61403120405]
  4. Fundamental Research Funds for the Central Universities [2019kfyXKJC024]
  5. National Key Laboratory Open Fund of China [6142113180211]
  6. Singapore Government's Research, Innovation and Enterprise 2020 Plan (Advanced Manufacturing and Engineering Domain) [A18A1b0045]

向作者/读者索取更多资源

The article addresses the challenge of dramatic imaging viewpoint variation for action recognition in depth video, proposing a discriminative MVDI fusion method via multi-instance learning to enhance cross-view 3-D action recognition performance. The method emphasizes enhancing view-tolerance of visual features and utilizing Fisher vector for better discriminative power.
Dramatic imaging viewpoint variation is the critical challenge toward action recognition for depth video. To address this, one feasible way is to enhance view-tolerance of visual feature, while still maintaining strong discriminative capacity. Multi-view dynamic image (MVDI) is the most recently proposed 3-D action representation manner that is able to compactly encode human motion information and 3-D visual clue well. However, it is still view-sensitive. To leverage its performance, a discriminative MVDI fusion method is proposed by us via multi-instance learning (MIL). Specifically, the dynamic images (DIs) from different observation viewpoints are regarded as the instances for 3-D action characterization. After being encoded using Fisher vector (FV), they are then aggregated by sum-pooling to yield the representative 3-D action signature. Our insight is that viewpoint aggregation helps to enhance view-tolerance. And, FV can map the raw DI feature to the higher dimensional feature space to promote the discriminative power. Meanwhile, a discriminative viewpoint instance discovery method is also proposed to discard the viewpoint instances unfavorable for action characterization. The wide-range experiments on five data sets demonstrate that our proposition can significantly enhance the performance of cross-view 3-D action recognition. And, it is also applicable to cross-view 3-D object recognition. The source code is available at https://github.com/3huo/ActionView.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据