☆ 4.6 Article

Model-Based Clustering with Measurement or Estimation Errors

GENES (2020)

Journal

GENES

Volume 11, Issue 2, Pages -

Publisher

MDPI

DOI: 10.3390/genes11020185

Keywords

gaussian finite mixture model; clustering analysis; uncertainty; expectation-maximization algorithm; classification boundary; gene expression; RNA-seq

Funding

National Institute of General Medical Sciences of the National Institutes of Health [R01 GM104977]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling-called MCLUST-ME-that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.

Model-Based Clustering with Measurement or Estimation Errors

Journal

GENES

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Model-Based Clustering with Measurement or Estimation Errors

Journal

GENES

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper