☆ 4.0 Article

Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS (2022)

Journal

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS

Volume 64, Issue 2, Pages 313-337

Publisher

WILEY

DOI: 10.1111/anzs.12370

Keywords

data augmentation; Gibbs sampling; latent variable models; Markov Chain Monte Carlo; non-Gaussian clusters; SNP genotyping

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Clustering is a necessary first step in statistical modeling and analysis of large and complex datasets, but existing clustering methods may fail to accurately detect cluster components in the presence of certain data characteristics. This article presents two Bayesian clustering approaches that aim to overcome these limitations, and demonstrates their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging, and environmental sciences.

Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations-a model-based 'tight' clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components-and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.

Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets

Journal

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets

Journal

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper