☆ 4.6 Article

Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features

BMC BIOINFORMATICS (2021)

期刊

BMC BIOINFORMATICS

卷 22, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s12859-021-04431-x

关键词

Eukaryotes; DNA replication; Origin; TF-IDF; Multi-layer perceptron; STREME

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Mathematical & Computational Biology

资金

National Natural Science Foundation of China [61762026, 61462018]
Guangxi Natural Science Foundation [2017GXNSFAA198278]
Innovation Project of GUET Graduate Education [2019YCXS056]
GUET Excellent Graduate Thesis Program [18YJPYSS14]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The research team conducted studies on the origins of DNA replication in seven eukaryotic species, proposed unique prediction and feature extraction methods, and demonstrated superior performance in experiments. After multiple cross validations, the prediction accuracy reached over 90% for all species, showing that the models of different species could predict each other with high accuracy and share common motifs.

Background: The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to genetic information errors, while the traditional biological experimental methods are time-consuming and laborious. Results: We carried out research on the origin of replication in a variety of eukaryotes and proposed a unique prediction method for each species. Throughout the experiment, we collected data from 7 species, including Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Kluyveromyces lactis, Pichia pastoris and Schizosaccharomyces pombe. In addition to the commonly used sequence feature extraction methods PseKNC-II and Base-content, we designed a feature extraction method based on TF-IDF. Then the two-step method was utilized for feature selection. After comparing a variety of traditional machine learning classification models, the multi-layer perceptron was employed as the classification algorithm. Ultimately, the data and codes involved in the experiment are available at . Conclusions: The prediction accuracy of the training set of the above-mentioned seven species after 100 times fivefold cross validation reach 92.60%, 90.80%, 91.22%, 96.15%, 96.72%, 99.86%, 96.72%, respectively. It denotes that compared with other methods, the methods we designed could accomplish superior performance. In addition, our experiments reveals that the models of multiple species could predict each other with high accuracy, and the results of STREME shows that they have a certain common motif.

Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文