☆ 4.7 Article

A Discriminative Vectorial Framework for Multi-Modal Feature Representation

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 24, 期 -, 页码 1503-1514

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2021.3066118

关键词

Semantics; Correlation; Task analysis; Emotion recognition; Visualization; Transforms; Image recognition; Audio emotion recognition; cross-modal analysis; discriminative correlation maximization; image analysis and recognition; knowledge discovery; multi-modal feature representation; multi-modal hashing

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

With the rapid advancements in sensory and computing technology, the attention towards multi-modal data sources representing the same pattern or phenomenon has increased. This paper proposes a discriminative vectorial framework for multi-modal feature representation in knowledge discovery, utilizing multi-modal hashing and discriminative correlation maximization analysis. The proposed framework minimizes semantic similarity among different modalities and extracts intrinsic discriminative representations across multiple data sources, leading to improved results in various applications.

Due to the rapid advancements of sensory and computing technology, multi-modal data sources that represent the same pattern or phenomenon have attracted growing attention. As a result, finding means to explore useful information from these multi-modal data sources has quickly become a necessity. In this paper, a discriminative vectorial framework is proposed for multi-modal feature representation in knowledge discovery by employing multi-modal hashing (MH) and discriminative correlation maximization (DCM) analysis. Specifically, the proposed framework is capable of minimizing the semantic similarity among different modalities by MH and exacting intrinsic discriminative representations across multiple data sources by DCM analysis jointly, enabling a novel vectorial framework of multi-modal feature representation. Moreover, the proposed feature representation strategy is analyzed and further optimized based on canonical and non-canonical cases, respectively. Consequently, the generated feature representation leads to effective utilization of the input data sources of high quality, producing improved, sometimes quite impressive, results in various applications. The effectiveness and generality of the proposed framework are demonstrated by utilizing classical features and deep neural network (DNN) based features with applications to image and multimedia analysis and recognition tasks, including data visualization, face recognition, object recognition; cross-modal (text-image) recognition and audio emotion recognition. Experimental results show that the proposed solutions are superior to state-of-the-art statistical machine learning (SML) and DNN algorithms.

A Discriminative Vectorial Framework for Multi-Modal Feature Representation

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Discriminative Vectorial Framework for Multi-Modal Feature Representation

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文