4.6 Article

Extracting elite pairwise constraints for clustering

Journal

NEUROCOMPUTING
Volume 99, Issue -, Pages 124-133

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.neucom.2012.06.013

Keywords

Semi-supervised; Elite pairwise constraints; Clustering

Funding

  1. US National Science Foundation (NSF) [CCF-0905337]
  2. Natural Science Foundation of China [61175062, 61033012]
  3. Fundamental Research Funds for the Central Universities [DUT12JR02]
  4. Division of Computing and Communication Foundations
  5. Direct For Computer & Info Scie & Enginr [0905337] Funding Source: National Science Foundation

Ask authors/readers for more resources

Semi-supervised clustering under pairwise constraints (i.e. must-links and cannot-links) has been a hot topic in the data mining community in recent years. Since pairwise constraints provided by distinct domain experts may conflict with each other, a lot of research work has been conducted to evaluate the effects of noise imposing on semi-supervised clustering. In this paper, we introduce elite pairwise constraints, including elite must-link (EML) and elite cannot-link (ECL) constraints. In contrast to traditional constraints, both EML and ECL constraints are required to be satisfied in every optimal partition (i.e. a partition with the minimum criterion function). Therefore, no conflict will be caused by those new constraints. First, we prove that it is NP-hard to obtain EML or ECL constraints. Then, a heuristic method named Limit Crossing is proposed to achieve a fraction of those new constraints. In practice, this new method can always retrieve a lot of EML or ECL constraints. To evaluate the effectiveness of Limit Crossing, multi-partition based and distance based methods are also proposed in this paper to generate faux elite pairwise constraints. Extensive experiments have been conducted on both UCI and synthetic data sets using a semi-supervised clustering algorithm named COP-KMedoids. Experimental results demonstrate that COP-KMedoids under EML and ECL constraints generated by Limit Crossing can outperform those under either faux constraints or no constraints. (C) 2012 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available