4.3 Article

iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning

期刊

SAR AND QSAR IN ENVIRONMENTAL RESEARCH
卷 32, 期 4, 页码 317-331

出版社

TAYLOR & FRANCIS LTD
DOI: 10.1080/1062936X.2021.1895884

关键词

Origin of replication sites; mono-nucleotide binary encoding; dinucleotide-based spatial autocorrelation; elastic net; stacking learning

资金

  1. National Natural Science Foundation of China [11601407]
  2. Natural Science Basic Research Program of Shaanxi Province of China [2019JQ-279]
  3. Fundamental Research Funds for the Central Universities [JB210715]

向作者/读者索取更多资源

DNA replication is fundamental in all living organisms and plays a crucial role in cell division and gene expression. Identifying replication origin sites is important for understanding gene regulation mechanisms and treating genetic diseases. A novel iORI-ENST model was developed using feature extraction, selection, and stacking learning to accurately identify ORIs with high accuracy.
DNA replication is not only the basis of biological inheritance but also the most fundamental process in all living organisms. It plays a crucial role in the cell-division cycle and gene expression regulation. Hence, the accurate identification of the origin of replication sites (ORIs) has a great meaning for further understanding the regulatory mechanism of gene expression and treating genic diseases. In this paper, a novel, feasible and powerful model, namely, iORI-ENST is designed for identifying ORIs. Firstly, we extract the different features by incorporating mono-nucleotide binary encoding and dinucleotide-based spatial autocorrelation. Subsequently, elastic net is utilized as the feature selection method to select the optimal feature set. And then stacking learning is employed to predict ORIs and non-ORIs, which contains random forest, adaboost, gradient boosting decision tree, extra trees and support vector machine. Finally, the ORI sites are identified on the benchmark datasets S-1 and S-2 with their accuracies of 91.41% and 95.07%, respectively. Meanwhile, an independent dataset S-3 is employed to verify the validation and transferability of our model and its accuracy reaches 91.10%. Comparing with state-of-the-art methods, our model achieves more remarkable performance. The results show our model is a feasible, effective and powerful tool for identifying ORIs. The source code and datasets are available at https://github.com/YingyingYao/iORI-ENST.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据