4.7 Article

Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches

Journal

ANALYTICA CHIMICA ACTA
Volume 644, Issue 1-2, Pages 10-16

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.aca.2009.04.010

Keywords

Least-squares support vector machine; Random forest; Gaussian process; Peptide; Liquid chromatography; Quantitative structure-retention relationship

Funding

  1. National Project 863 Fund [2006AA02Z312]
  2. National Natural Science Fund [30371339, 30571748]

Ask authors/readers for more resources

Three machine learning algorithms as least-squares support vector machine (LSSVM), random forest (RF) and Gaussian process (GP) were used to model the quantitative structure-retention relationship (QSRR) for predicting and explaining the retention behavior of proteome-wide peptides in the reverse-phase liquid chromatography. Peptides were parameterized using CODESSA approach and 145 descriptors were obtained for each peptide, including diverse Structural information such as constitutional, topological, geometrical and physicochemical property. Based upon that, the nonlinear LSSVM, RF and GP as well as another sophisticated linear method (partial least-squares regression (PLS)) were employed in the QSRR model development. By a series of systematic validations as internal cross-validation, external test and Monte Carlo cross-validation. the stability and predictive power of the constructed models were confirmed. Results show that regression models developed using nonlinear approaches such as LSSVM, RF and GP predict better than linear PLS models. Considering the retention times used in this work were measured in different columns and thus have a relatively large uncertainty (reproducibility within 7%), the optimal statistics obtained from GP modeling are satisfactory, with the coefficients of determination (R-2) for training set and test set of 0.894 and 0.866, respectively. (C) 2009 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available