4.5 Article

A Machine Learning Framework Identifies Plastid-Encoded Proteins Harboring C3 and C4 Distinguishing Sequence Information

期刊

GENOME BIOLOGY AND EVOLUTION
卷 15, 期 7, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/gbe/evad129

关键词

C-4 photosynthesis; convergent evolution; PACMAD; grasses; plastome; machine learning

向作者/读者索取更多资源

C-4 photosynthesis is a remarkable example of convergent evolution. In this study, a machine learning approach was used to identify plastid genes with distinguishing information for C-3 and C-4 classification. Several key sequences and sites were identified that are highly predictive of C-3/C-4 status.
C-4 photosynthesis is known to have at least 61 independent origins across plant lineages making it one of the most notable examples of convergent evolution. Of the >60 independent origins, a predicted 22-24 origins, encompassing greater than 50% of all known C-4 species, exist within the Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae (PACMAD) clade of the Poaceae family. This clade is therefore primed with species ideal for the study of genomic changes associated with the acquisition of the C-4 photosynthetic trait. In this study, we take advantage of the growing availability of sequenced plastid genomes and employ a machine learning (ML) approach to screen for plastid genes harboring C-3 and C-4 distinguishing information in PACMAD species. We demonstrate that certain plastid-encoded protein sequences possess distinguishing and informative sequence information that allows them to train accurate ML C-3/C-4 classification models. Our RbcL-trained model, for example, informs a C-3/C-4 classifier with greater than 99% accuracy. Accurate prediction of photosynthetic type from individual sequences suggests biologically relevant, and potentially differing roles of these sequence products in C-3 versus C-4 metabolism. With this ML framework, we have identified several key sequences and sites that are most predictive of C-3/C-4 status, including RbcL, subunits of the NAD(P)H dehydrogenase complex, and specific residues within, further highlighting their potential significance in the evolution and/or maintenance of C-4 photosynthetic machinery. This general approach can be applied to uncover intricate associations between other similar genotype-phenotype relationships.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据