4.7 Article

Automatic selection of the number of clusters using Bayesian clustering and sparsity-inducing priors

Journal

ECOLOGICAL APPLICATIONS
Volume 32, Issue 3, Pages -

Publisher

WILEY
DOI: 10.1002/eap.2524

Keywords

Bayesian nonparametrics; biogeographic region model; clustering; mixture model; movement ecology; species archetype model

Funding

  1. U.S. Department of Agriculture National Institute of Food and Agriculture McIntire-Stennis project [1005163]
  2. U.S. National Science Foundation [1458034, 2040819]
  3. Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES) [1575316]
  4. Fundacao de Apoio ao Desenvolvimento do Ensino, Ciencia e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT) [23/200.715/2013]
  5. Direct For Biological Sciences
  6. Div Of Biological Infrastructure [1458034] Funding Source: National Science Foundation
  7. Div Of Biological Infrastructure
  8. Direct For Biological Sciences [2040819] Funding Source: National Science Foundation
  9. NIFA [811996, 1005163] Funding Source: Federal RePORTER

Ask authors/readers for more resources

This article highlights the importance and advantages of using Bayesian clustering methods in ecological and environmental sciences, and proposes the use of sparsity-inducing priors to determine the number of groups. Through application examples using simulated and real data, it demonstrates that this approach can successfully recover the true number of groups.
Clustering is a ubiquitous task in ecological and environmental sciences and multiple methods have been developed for this purpose. Because these clustering methods typically require users to a priori specify the number of groups, the standard approach is to run the algorithm for different numbers of groups and then choose the optimal number using a criterion (e.g., AIC or BIC). The problem with this approach is that it can be computationally expensive to run these clustering algorithms multiple times (i.e., for different numbers of groups) and some of these information criteria can lead to an overestimation of the number of groups. To address these concerns, we advocate for the use of sparsity-inducing priors within a Bayesian clustering framework. In particular, we highlight how the truncated stick-breaking (TSB) prior, a prior commonly adopted in Bayesian nonparametrics, can be used to simultaneously determine the number of groups and estimate model parameters for a wide range of Bayesian clustering models without requiring the fitting of multiple models. We illustrate the ability of this prior to successfully recover the true number of groups for three clustering models (two types of mixture models, applied to GPS movement data and species occurrence data, as well as the species archetype model) using simulated data in the context of movement ecology and community ecology. We then apply these models to armadillo movement data in Brazil, plant occurrence data from Alberta (Canada), and bird occurrence data from North America. We believe that many ecological and environmental sciences applications will benefit from Bayesian clustering methods with sparsity-inducing priors given the ubiquity of clustering and the associated challenge of determining the number of groups. Two R packages, EcoCluster and bayesmove, are provided that enable the straightforward fitting of these models with the TSB prior.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available