4.4 Article

Informative metabolites identification by variable importance analysis based on random variable combination

Journal

METABOLOMICS
Volume 11, Issue 6, Pages 1539-1551

Publisher

SPRINGER
DOI: 10.1007/s11306-015-0803-x

Keywords

Variable importance; Combination effect; Informative metabolites; Partial least squares-linear discriminant analysis

Funding

  1. National Nature Foundation Committee of P.R. China [21275164, 21465016, 21175157, 21375151]
  2. Fundamental Research Funds for the Central Universities of Central South University [2014zzts014]

Ask authors/readers for more resources

Main target of metabolomics research is to reveal informative metabolites or biomarkers, which can be considered as a process of variable selection. So far, several methods, such as regression coefficient (RC), weights or variable importance in projection (VIP), have been widely used to assess the variable importance when building the partial least squares linear discriminant analysis PLS-LDA classification model. Then a set of metabolites can be selected by fixing a threshold value considering the rank of metabolites. However, they do not take into account the combination effect among a subset of variables, which will lead to bias within the results. In this work, a strategy named as variable importance analysis based on random variable combination (VIAVC), is developed for statistical assessment of variable importance. The framework of VIAVC includes mainly three parts: (1) employ a novel variables sampling method, called binary matrix resampling, which can guarantee that each variable has been selected with the same probability and generate a population of different variable combinations; (2) the importance of each variable is assessed by percent decrease or increase of the area under the receiver operating characteristic curve when the variable is excluded for the modeling by PLS-LDA; (3) iteratively retain and output the rank of the final remaining informative variables. The results of the applications to three metabolic datasets illustrate that VIAVC has better performance compared with other methods including RC, VIP and subwindow permutation analysis. The MATLAB code for implementing VIAVC is available in the supplemental materials.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available