4.7 Article

A novel multi-view clustering approach via proximity-based factorization targeting structural maintenance and sparsity challenges for text and image categorization

Journal

INFORMATION PROCESSING & MANAGEMENT
Volume 58, Issue 4, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2021.102546

Keywords

Multi-view learning; Clustering; High-order proximity; Spectral clustering; Non-negative Matrix Factorization

Ask authors/readers for more resources

This study proposes a novel clustering approach that handles sparsity in real-world multimedia data by utilizing both local and global structures, and demonstrates strong robustness to sparse data.
Multi-view data contains a set of features representing different perspectives associated with the same data and this phenomenon can be commonly observed in real-world applications. Multi-view clustering in terms of text and image data faces substantial challenges such as Structure-preserving and Sparsity. Existing methods do not conserve the structure of data space and the recent improvements have earmarked only the local layout. Preserving the local structure of data space is not sufficient to handle sparsity in these data. In this paper, we propose a novel clustering approach, called Proximity-based Multi-View Non-negative Matrix Factorization (PMVNMF), which utilizes both the local and global structure of data space conjointly to handle sparsity in real-world multimedia (text and image) data. For each view, the 1-step and 2-step transition probability matrices as the first-order and second-order proximity matrices are constructed to uncover their respective latent local and global geometric structures. Then, view-specific proximity matrices as an integration of the above two types of proximity matrices are constructed. Eventually, Non-negative Matrix Factorization (NMF) is explored via graph regularization and consensus regularization, to consider the obtained integrated graph structures as well as to disclose the indistinct common structure shared by all representations. The algorithm can capture elementary structure of data space and is robust to sparse data. We conduct experiments on six real-world datasets including two text and four image datasets; and compare the performance of the proposed algorithm with eight baseline approaches. Six evaluation metrics including accuracy, f-score, precision, recall, NMI, and entropy are employed to evaluate the performance of algorithm. The results show the outperformance of proposed algorithm over baselines.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available