4.7 Article

Semi-Supervised Clustering via Cannot Link Relationship for Multiview Data

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2022.3197230

Keywords

Clustering algorithms; Tensors; Unsupervised learning; Semisupervised learning; Supervised learning; Clustering methods; Robustness; Multi-view clustering; semi-supervised learning; cannot link constraint

Ask authors/readers for more resources

With the increasing interest in multi-view clustering due to the diversity of data modalities, this paper proposes a valid semi-supervised multi-view spectral clustering algorithm. By incorporating prior knowledge, utilizing tensor minimization, and applying cannot-link constraints, the algorithm outperforms current methods in terms of stability and accuracy. Experimental results on various datasets demonstrate the algorithm's effectiveness and potential applications.
to the diversity of data modalities, the research interest of multi-view clustering is gradually increasing, in the field of large-data analytics, particularly in clustering. However, the greater part of current multi-view clustering methods is mainly in view of unsupervised learning, which leads to unpredictable results and algorithmic instability. Besides, they ignore the diversity of graphs, which is not desirable in practical applications, because the characteristic properties of each view are different. To solve these problems, inspired by the outstanding performance of semi-supervised learning in machine learning, we propose a valid semi-supervised multi-view spectral clustering algorithm. We use the pre-set labels as prior knowledge to obtain the overall distribution of the remaining unlabeled data. Tensor minimization Schatten p-norm is utilized to mine the mutual information hidden in multiple views. Meanwhile, we also use the cannot-link as another semi-supervised constraint to update the graph. Our proposed algorithm is generally 5%-10% better than the comparison algorithms in view of the experimental results on five datasets, and our algorithm is relatively fast with the computational complexity of O(T (n(2) log(n) + n(2) + u(2)l + ulc + uc log(c))), where T denotes the number of iterations and n, l, u represent the number of samples, the number of labeled and unlabeled samples, respectively, which shows that our proposed method has broad application prospects.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available