☆ 4.7 Article

SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning

IEEE TRANSACTIONS ON MULTIMEDIA (2019)

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Volume 21, Issue 11, Pages 2916-2929

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2019.2912735

Keywords

Semantics; Correlation; Visualization; Skeleton; Matrix decomposition; Kernel; Laplace equations; Unsupervised multi-view subspace learning; semantic inconsistency; tensor factorization; deep auto-encoders

Funding

National Natural Science Foundation of China [61836002, 61771457, 61620106009, 61732007, 61672497, U1636214, 61572488, 61772494, 61472389]
National Basic Research Program of China (973 Program) [2015CB351800]
Key Research Program of Frontier Sciences [CAS: QYZDJ-SSW-SYS013]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Multi-view representation learning plays a fundamental role in multimedia data analysis. Some specific inter-view alignment principles are adopted in conventional models, where there is an assumption that different views share a common latent subspace. However, when dealing views on diverse semantic levels, the view-specific characteristics are neglected, and the divergent inconsistency of similarity measurements hinders sufficient information sharing. This paper proposes a hybrid deep network by introducing tensor factorization into the multi-view deep auto-encoder. The network adopts skeleton-embedding process for unsupervised multi-view subspace learning. It takes full consideration of view-specific characteristics, and leverages the strength of both shallow and deep architectures for modeling low- and high-level views, respectively. We first formulate the high-level-view semantic distribution as the underlying skeleton structure of the learned subspace, and then infer the local tangent structures according to the affinity propagation of low-level-view geometric correlations. As a consequence, more discriminative subspace representation can be learned from global semantic pivots to local geometric details. Experimental comparisons on three benchmark image datasets show the promising performance and flexibility of our model.

SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper