4.0 Article

A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification

Journal

ALGORITHMS FOR MOLECULAR BIOLOGY
Volume 12, Issue -, Pages -

Publisher

BIOMED CENTRAL LTD
DOI: 10.1186/s13015-017-0112-1

Keywords

Mixture model; Beta distribution; Maximum likelihood; Method of moments; EM algorithm; Differential methylation; Classification

Funding

  1. Federal Ministry of Education and Research (BMBF) [01KU1216]
  2. Mercator Research Center Ruhr (MERCUR) [Pe-2013-0012]
  3. German Research Foundation (DFG), Collaborative Research Center [SFB 876]

Ask authors/readers for more resources

Background: Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. Methods: While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm combines latent variables with the method of moments instead of maximum likelihood, which has computational advantages over the popular EM algorithm. Results: As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels. We also demonstrate that we can accurately infer the number of mixture components. Conclusions: The hybrid algorithm between likelihood-based component un-mixing and moment-based parameter estimation is a robust and efficient method for beta mixture estimation. We provide an implementation of the method (betamix) as open source software under the MIT license.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available