4.5 Article

Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
Volume 19, Issue 2, Pages 354-376

Publisher

AMER STATISTICAL ASSOC
DOI: 10.1198/jcgs.2009.08054

Keywords

Cluster overlap; Eccentricity of ellipsoid; Mclust; MixSim; Mixture distribution; Parallel distribution plots

Funding

  1. National Science Foundation CAREER [DMS-0437555]
  2. National Institutes of Health [DC-0006740]

Ask authors/readers for more resources

A new method is proposed to generate sample Gaussian mixture distributions according to prespecified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying prespecified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering. This article has supplementary material online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available