4.7 Article

Information entropy-based differential evolution with extremely randomized trees and LightGBM for protein structural class prediction

期刊

APPLIED SOFT COMPUTING
卷 136, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.asoc.2023.110064

关键词

Protein structural class; Prediction model; Feature selection; Evolutionary algorithm; Single objective optimization

向作者/读者索取更多资源

The discovery of protein tertiary structure is crucial for genetic engineering, medicinal design, and other biological applications. Protein structural class plays a vital role in protein folding and function analysis. Existing methods for confirming protein folding cannot handle the increasing number of protein sequences. In this paper, a novel super-large-scale feature based on secondary structure, evolutionary information, chemical properties, and global descriptors is constructed to predict protein class.
The discovery of protein tertiary structure is the basis of current genetic engineering, medicinal design, and other biological applications. Protein structural class plays a significant role in the tertiary structure folding and function analysis of protein. However, the growth rate of new amino acid sequence far exceeds the tertiary structure. Existing research methods of confirming protein folding cannot satisfy massive sequences and protein engineering. A high-accuracy prediction result of low-similarity protein dataset is particularly critical to generate the corresponding tertiary structure from the primary structure. In this paper, we construct a novel super-large-scale feature of the primary structure based on secondary structure, evolutionary information, chemical properties, and global descriptors. The diversified and massive features are utilized to predict the protein class based on a novel feature selection algorithm and a gradient boosting decision tree model. To testify the effectiveness and robustness of our proposed method, namely IDEGBM, we choose the 10-fold cross-validation for evaluating four benchmark datasets 25PDB, FC699, D1189 and D640. Experimental results exhibit that our method improves the accuracy in comparison with other state-of-the-art prediction models in terms of both accuracy and efficiency. Furthermore, a representative protein is used to validate that our proposed IDEGBM can be applied to improve the conformation prediction of protein tertiary structure. ?? 2023 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据