4.7 Article

Semi-supervised concept factorization for document clustering

Journal

INFORMATION SCIENCES
Volume 331, Issue -, Pages 86-98

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2015.10.038

Keywords

Concept factorization; Locally consistent concept factorization; Semi-supervised document clustering

Funding

  1. National Natural Science Foundation of China [61272297, 61402207, 61373093]
  2. Natural Science Foundation of Jiangsu Province of China [BK20140008]
  3. Natural Science Foundation of the Jiangsu Higher Education Institutions of China [13KJA520001]
  4. Qing Lan Project

Ask authors/readers for more resources

Nonnegative Matrix Factorization (NMF) and Concept Factorization (CF) are two popular methods for finding the low-rank approximation of nonnegative matrix. Different from NMF, CF can be applied not only to the matrix containing negative values but also to the kernel space. Based on NMF and CF, many methods, such as Graph regularized Nonnegative Matrix Factorization (GNMF) and Locally Consistent Clustering Factorization (LCCF) can significandy improve the performance of clustering. Unfortunately, these are unsupervised learning methods. In order to enhance the clustering performance with the supervisory information, a Semi-Supervised Concept Factorization (SSCF) is proposed in this paper by incorporating the pairwise constraints into CF as the reward and penalty terms, which can guarantee that the data points belonging to a cluster in the original space are still in the same cluster in the transformed space. By comparing with the state-of-the-arts algorithms (KM, NMF, CF, LCCF, GNMF, PCCF), experimental results on document clustering show that the proposed algorithm has better performance in terms of accuracy and mutual information. (C) 2015 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available