☆ 4.5 Article

Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition

MACHINE VISION AND APPLICATIONS (2021)

期刊

MACHINE VISION AND APPLICATIONS

卷 32, 期 6, 页码 -

出版社

SPRINGER

DOI: 10.1007/s00138-021-01249-8

关键词

Action recognition; Early fusion; Intermediate fusion; Late fusion; Deep learning

类别

Computer Science, Artificial Intelligence Computer Science, Cybernetics Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Multimodal action recognition techniques combine multiple image modalities, including early fusion, intermediate fusion, and late fusion methods for more robust recognition. By deeply investigating different fusion levels, new schemes are proposed to better combine features from different modalities and achieve better recognition results than traditional methods.

Multimodal action recognition techniques combine several image modalities (RGB, Depth, Skeleton, and InfraRed) for a more robust recognition. According to the fusion level in the action recognition pipeline, we can distinguish three families of approaches: early fusion, where the raw modalities are combined ahead of feature extraction; intermediate fusion, the features, respective to each modality, are concatenated before classification; and late fusion, where the modality-wise classification results are combined. After reviewing the literature, we identified the principal defects of each category, which we try to address by first investigating more deeply the early-stage fusion that has been poorly explored in the literature. Second, intermediate fusion protocols operate on the feature map, irrespective of the particularity of human action, we propose a new scheme where we optimally combine modality-wise features. Third, as most of the late fusion solutions use handcrafted rules, prone to human bias, and far from real-world peculiarities, we adopt a neural learning strategy to extract significant features from data rather than assuming that artificial rules are correct. We validated our findings on two challenging datasets. Our obtained results were as good or better than their literature counterparts.

Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition

期刊

MACHINE VISION AND APPLICATIONS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition

期刊

MACHINE VISION AND APPLICATIONS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文