☆ 4.5 Article

Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment

JOURNAL OF COMPUTATIONAL BIOLOGY (2013)

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

卷 20, 期 4, 页码 259-279

出版社

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2013.0007

关键词

sequence alignment; accuracy assessment; parameter choice; machine learning; feature functions

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

U.S. National Science Foundation [IIS-1050293, IIS-1217886]
U.S. National Science Foundation through the University of Arizona IGERT in Comparative Genomics [DGE-0654435]
Div Of Information & Intelligent Systems
Direct For Computer & Info Scie & Enginr [1217886] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for feature-based accuracy estimator''), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality.

Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文