4.6 Article

An Adaptive Density Peaks Clustering Method With Fisher Linear Discriminant

Journal

IEEE ACCESS
Volume 7, Issue -, Pages 72936-72955

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2019.2918952

Keywords

Density peaks clustering; Kernel density estimation function; density estimation entropy; Fisher linear discriminant; reduction

Funding

  1. National Natural Science Foundation of China [61772176, 61402153, 61370169]
  2. China Postdoctoral Science Foundation [2016M602247]
  3. Plan for Scientific Innovation Talent of Henan Province [184100510003]
  4. Key Scientific and Technological Project of Henan Province [182102210362]
  5. Young Scholar Program of Henan Province [2017GGJS041]
  6. Natural Science Foundation of Henan Province [182300410130]

Ask authors/readers for more resources

Clustering is one of the most important topics in data mining and machine learning. The density peaks clustering (DPC) algorithm is a well-known density-based clustering method that can efficiently and effectively deal with non-spherical clusters. However, the computational methods of the local density and the distance measure are simple and easily ignore the correlation and the similarity between samples, and the manual setting of parameters has a great influence on the clustering results; therefore, the clustering performance of DPC is poor on the high-dimensional datasets. To address these issues, this paper presents an adaptive DPC algorithm with Fisher linear discriminant for the clustering of complex datasets, called ADPC-FLD. First, the kernel density estimation function is introduced to calculate the local density of the sample points. Pearson correlation coefficient between samples as weight is employed to construct a weighted Euclidean distance function to measure the distance between samples. This considers both the spatial structure and the correlation of the samples. Then, a novel density estimation entropy is proposed, and based on the minimization of density estimation entropy, the density estimation parameters are adaptively selected according to the distribution characteristics of the data, which can efficiently eliminate the influence of manual setting. Third, an adaptive strategy of cluster center selection is designed to avoid the error caused by the noise data as the cluster centers and the uncertainty of manually selecting the cluster centers. Finally, Fisher linear discriminant algorithm is used to eliminate the irrelevant information and reduce the dimensionality of high-dimensional data, following on which an adaptive DPC method is implemented on six synthetic datasets, thirteen UCI datasets and seven gene expression datasets for comparing with other related algorithms. The experimental results on 26 datasets show that the proposed algorithm significantly outperforms several outstanding clustering approaches in terms of clustering accuracy and efficiency.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available