期刊
JOURNAL OF MOLECULAR BIOLOGY
卷 433, 期 11, 页码 -出版社
ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.jmb.2020.08.014
关键词
circular dichroism; nucleic acids secondary structure prediction; XGBoost algorithm; Kohonen algorithm; nnet algorithm
资金
- Ministry of Human Resource Development, Government of India
- BIRAC-SRISTI GYTI award [PMU_2017_010, PMU_2019_007]
- Indian Institute of Technology Hyderabad (IITH)
XGBoost and nnet algorithms were utilized to predict diverse secondary structures of nucleic acids, showing similar prediction accuracy of approximately 85% to 87%. Both algorithms can be employed for predicting hybrid nucleic acid topologies in the future.
Nucleic acids exhibit a repertoire of conformational preference depending on the sequence and environment. Circular dichroism (CD) is an essential and valuable tool for monitoring such secondary structural conformations of nucleic acids. Nonetheless, the CD spectral diversity associated with these structures poses a challenge in obtaining the quantitative information about the secondary structural content of a given CD spectrum. To this end, the competence of the extreme gradient boosting decision-tree (XGBoost), Kohonen and neural network (nnet) algorithms have been exploited here to predict the diverse secondary structures of nucleic acids. A curated library of 450 CD spectra corresponding to 16 different secondary structures of nucleic acids has been created and used as a training dataset. The hyperparameters corresponding to the aforementioned algorithms have been optimized using holdout and k-fold (here, k = 5) cross-validation methods. For a test dataset of 150 CD spectra, both the nnet and XGBoost algorithms have exhibited nearly similar prediction accuracy in the range of 85% and 87% (the latter exhibited a slightly higher prediction accuracy). Thus, the nnet and XGBoost algorithms tested here can be employed for predicting the hybrid nucleic acid topologies in future. For the sake of accessibility, the entire process has been automated and implemented as a webserver, called CD-NuSS (CD to nucleic acids secondary structure) and is freely accessible at https://project.iith.ac.in/cdnuss/. (C) 2020 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据