4.7 Article

i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting

期刊

FRONTIERS IN PLANT SCIENCE
卷 13, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fpls.2022.845835

关键词

N6-methyladenine; plant genomes; cross-species; feature encoding; ensemble learning

资金

  1. National Natural Science Foundation of China [61901103, 61801432, 61771165]
  2. Natural Science Foundation of Heilongjiang Province [LH2019F002]
  3. Postdoctoral Science Foundation of Heilongjiang Province of China [LBH-Z19106]

向作者/读者索取更多资源

This article presents a novel model named i6mA-vote for predicting 6mA sites in plants. The model utilizes different coding strategies for feature extraction and combines multiple base-classifiers under a majority voting strategy. The performance of i6mA-vote is evaluated on three different plant genomes, showing its effectiveness in predicting 6mA sites across species. The results also highlight the importance of nucleotide position in identifying 6mA sites.
DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据