☆ 4.7 Article

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

卷 30, 期 -, 页码 4384-4394

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2021.3071688

关键词

Training; Kernel; Interpolation; Data models; Geometry; Learning systems; Deep learning; Multi-modal learning; multi-view learning; cross-modal retrieval; nonlinear embeddings; supervised embeddings; RBF interpolators

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Numerous approaches exist in the literature for learning low-dimensional representations of multi-modal data collections, yet the generalizability of multi-modal nonlinear embeddings to unseen data has been overlooked. The study highlights the importance of the regularity of interpolation functions for successful generalization in multi-modal classification and retrieval problems, alongside criteria such as between-class separation and cross-modal alignment. The proposed multi-modal nonlinear representation learning algorithm, inspired by theoretical findings, shows promising performance in applications such as multi-modal image classification and cross-modal image-text retrieval.

While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications.

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文