4.7 Article

Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework

期刊

BRIEFINGS IN BIOINFORMATICS
卷 22, 期 3, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa202

关键词

DNA N-6-methyladenine modification; prediction model; feature extraction; two-step feature optimization; meta-predictor

资金

  1. JSPS [19F19377, 19H04208]
  2. Basic Science Research Program through the National Research Foundation of Korea - Ministry of Science and ICT (MSIT) [2018R1D1A1B07049572, 2019R1I1A1A01062260, 2020R1A4A4079722, 2020M3E5D9080661]
  3. Grants-in-Aid for Scientific Research [19H04208, 19F19377] Funding Source: KAKEN
  4. National Research Foundation of Korea [00000002, 2020M3E5D9080661, 2019R1I1A1A01062260] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

DNA N-6-methyladenine (6mA) is a crucial epigenetic modification responsible for various cellular processes. Accurate identification of 6mA sites is challenging in genome analysis, and existing machine learning models have limited practical application across different plant species. In this study, Meta-i6mA, combining 30 baseline models, performed exceptionally well in independent tests on Rosaceae, rice, and Arabidopsis thaliana, showing higher Matthews correlation coefficient values compared to existing predictors.
DNA N-6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naive Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据