4.7 Article

A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences

期刊

出版社

ELSEVIER
DOI: 10.1016/j.csbj.2022.06.036

关键词

RNA-protein interactions; Protein-binding nucleotides; Machine learning; Stratified cross validation; Random forest classifier

资金

  1. IIT Kharagpur
  2. CSIR, India

向作者/读者索取更多资源

RNA-protein interactions play important roles in cellular machineries, but the molecular mechanism is still unclear. Study of binding interfaces is crucial for understanding molecular functioning and aberrations. Efficient computational algorithms are needed to identify protein-binding nucleotides in RNA with limited structural data compared to sequence data.
RNA-protein interactions play vital roles in driving the cellular machineries. Despite significant involvement in several biological processes, the underlying molecular mechanism of RNA-protein interactions is still elusive. This may be due to the experimental difficulties in solving co-crystallized RNA-protein complexes. Inherent flexibility of RNA molecules to adopt different conformations makes them functionally diverse. Their interactions with protein have implications in RNA disease biology. Thus, study of binding interfaces can provide a mechanistic insight of the molecular functioning and aberrations caused due to altered interactions. Moreover, high-throughput sequencing technologies have generated huge sequence data compared to available structural data of RNA-protein complexes. In such a scenario, efficient computational algorithms are required for identification of protein-binding interfaces of RNA in the absence of known structures. We have investigated several machine learning classifiers and various features derived from nucleotide sequences to identify protein-binding nucleotides in RNA. We achieve best performance with nucleotide-triplet and nucleotide-quartet feature-based random forest models. An overall accuracy of 84.8%, sensitivity of 83.2%, specificity of 86.1%, MCC of 0.70 and AUC of 0.93 is achieved. We have further implemented the developed models in a user-friendly webserver Nucpred, which is freely accessible at http://www.csb.iitkgp.ac.in/applications/Nucpred/index. (c) 2022 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据