☆ 4.8 Article

Evolutionary Sparse Learning for Phylogenomics

MOLECULAR BIOLOGY AND EVOLUTION (2021)

期刊

MOLECULAR BIOLOGY AND EVOLUTION

卷 38, 期 11, 页码 4674-4682

出版社

OXFORD UNIV PRESS

DOI: 10.1093/molbev/msab227

关键词

machine learning; phylogenetics; total evidence; phylogenomics; functional genomics

类别

Biochemistry & Molecular Biology Evolutionary Biology Genetics & Heredity

资金

U.S. National Institutes of Health [GM-0126567-01]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Evolutionary Sparse Learning (ESL) is a supervised machine learning approach with sparsity constraints that builds models using only the most important genomic loci to explain phylogenetic hypotheses or trait presence/absence. ESL does not involve traditional parameters, but directly utilizes sequence variation concordance. ESL offers a natural way to combine different data types and has the potential to drive the development of new computational methods.

We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci-such as genes, proteins, genomic segments, and positions-as parameters. Using the Least Absolute Shrinkage and Selection Operator, ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL models do not directly involve conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and nonmolecular data types and incorporate biological and functional annotations of genomic loci in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example, and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics, particularly for identifying influential loci and sequences given a phylogeny and building models to test hypotheses. ESL's fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics.

Evolutionary Sparse Learning for Phylogenomics

期刊

MOLECULAR BIOLOGY AND EVOLUTION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Evolutionary Sparse Learning for Phylogenomics

期刊

MOLECULAR BIOLOGY AND EVOLUTION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文