4.7 Article

Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 33, Issue 8, Pages 3075-3089

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2019.2962412

Keywords

Clustering; density-based clustering; hierarchical clustering; data mining; relative neighborhood graph

Funding

  1. NSERC, Canada
  2. CNPq, under the program Science without Borders, Brazil

Ask authors/readers for more resources

HDBSCAN*, a cutting-edge hierarchical clustering method based on density, can be challenging to optimize for the parameter mpts. This paper introduces an approach that allows for the efficient computation of multiple HDBSCAN* hierarchies for a range of mpts values, achieving over a hundred hierarchies for a computational cost equivalent to running HDBSCAN* twice.
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While a small change in mpts typically leads to a small change in the clustering structure, choosing a good mpts value can be challenging: depending on the data distribution, a high or low mpts value may be more appropriate, and certain clusters may reveal themselves at different values. To explore results for a range of mpts values, one has to run HDBSCAN* for each value independently, which can be computationally impractical. In this paper, we propose an approach to efficiently compute all HDBSCAN* hierarchies for a range of mpts values by building upon results from computational geometry to replace HDBSCAN*'s complete graph with a smaller equivalent graph. An experimental evaluation shows that our approach can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN* about twice, which corresponds to a speedup of more than 60 times, compared to running HDBSCAN* independently that many times. We also propose a series of visualizations that allow users to analyze a collection of hierarchies for a range of mpts values, along with case studies that illustrate how these analyses are performed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available