4.7 Article

Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2021.3070179

Keywords

Visualization; Feature extraction; Encoding; Skeleton; Task analysis; Image recognition; Image coding; Cross-view 3-D action recognition; discriminative viewpoint instance discovery; Fisher vector (FV); multi-view dynamic image (MVDI); viewpoint aggregation

Funding

  1. National Natural Science Foundation of China [61502187, 61876211]
  2. Natural Science Foundation of Hunan Province [2018JJ2052]
  3. Equipment Pre-Research Field Fund of China [61403120405]
  4. Fundamental Research Funds for the Central Universities [2019kfyXKJC024]
  5. National Key Laboratory Open Fund of China [6142113180211]
  6. Singapore Government's Research, Innovation and Enterprise 2020 Plan (Advanced Manufacturing and Engineering Domain) [A18A1b0045]

Ask authors/readers for more resources

The article addresses the challenge of dramatic imaging viewpoint variation for action recognition in depth video, proposing a discriminative MVDI fusion method via multi-instance learning to enhance cross-view 3-D action recognition performance. The method emphasizes enhancing view-tolerance of visual features and utilizing Fisher vector for better discriminative power.
Dramatic imaging viewpoint variation is the critical challenge toward action recognition for depth video. To address this, one feasible way is to enhance view-tolerance of visual feature, while still maintaining strong discriminative capacity. Multi-view dynamic image (MVDI) is the most recently proposed 3-D action representation manner that is able to compactly encode human motion information and 3-D visual clue well. However, it is still view-sensitive. To leverage its performance, a discriminative MVDI fusion method is proposed by us via multi-instance learning (MIL). Specifically, the dynamic images (DIs) from different observation viewpoints are regarded as the instances for 3-D action characterization. After being encoded using Fisher vector (FV), they are then aggregated by sum-pooling to yield the representative 3-D action signature. Our insight is that viewpoint aggregation helps to enhance view-tolerance. And, FV can map the raw DI feature to the higher dimensional feature space to promote the discriminative power. Meanwhile, a discriminative viewpoint instance discovery method is also proposed to discard the viewpoint instances unfavorable for action characterization. The wide-range experiments on five data sets demonstrate that our proposition can significantly enhance the performance of cross-view 3-D action recognition. And, it is also applicable to cross-view 3-D object recognition. The source code is available at https://github.com/3huo/ActionView.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available