4.8 Article

Molecular Descriptor Subset Selection in Theoretical Peptide Quantitative Structure-Retention Relationship Model Development Using Nature-Inspired Optimization Algorithms

期刊

ANALYTICAL CHEMISTRY
卷 87, 期 19, 页码 9876-9883

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.analchem.5b02349

关键词

-

资金

  1. Basic Science Research Program through the National Research Foundation of Korea (NRF) - Ministry of Science, ICT and Future Planning [2013R1A1A1A05004852]
  2. National Research Foundation of Korea [2013R1A1A1A05004852] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (EPA), was compared in molecular descriptor selection for development of quantitative structure retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据