4.4 Article

The influence of scaling metabolomics data on model classification accuracy

期刊

METABOLOMICS
卷 11, 期 3, 页码 684-695

出版社

SPRINGER
DOI: 10.1007/s11306-014-0738-7

关键词

Pre-treatment; Metabolomics; Classification accuracy; Scaling; Chemometrics

资金

  1. PhastID European project within Seventh Framework Programme for Research and Technological Development [258238]
  2. Engineering and Physical Sciences Research Council [GR/S96685/01] Funding Source: researchfish

向作者/读者索取更多资源

Correctly measured classification accuracy is an important aspect not only to classify pre-designated classes such as disease versus control properly, but also to ensure that the biological question can be answered competently. We recognised that there has been minimal investigation of pre-treatment methods and its influence on classification accuracy within the metabolomics literature. The standard approach to pre-treatment prior to classification modelling often incorporates the use of methods such as autoscaling, which positions all variables on a comparable scale thus allowing one to achieve separation of two or more groups (target classes). This is often undertaken without any prior investigation into the influence of the pre-treatment method on the data and supervised learning techniques employed. Whilst this is useful for deriving essential information such as predictive ability or visual interpretation in many cases, as shown in this study the standard approach is not always the most suitable option available. Here, a study has been conducted to investigate the influence of six pre-treatment methods-autoscaling, range, level, Pareto and vast scaling, as well as no scaling-on four classification models, including: principal components-discriminant function analysis (PC-DFA), support vector machines (SVM), random forests (RF) and k-nearest neighbours (kNN)-using three publically available metabolomics data sets. We have demonstrated that undertaking different pre-treatment methods can greatly affect the interpretation of the statistical modelling outputs. The results have shown that data pre-treatment is context dependent and that there was no single superior method for all the data sets used. Whilst we did find that vast scaling produced the most robust models in terms of classification rate for PC-DFA of both NMR spectroscopy data sets, in general we conclude that both vast scaling and autoscaling produced similar and superior results in comparison to the other four pre-treatment methods on both NMR and GC-MS data sets. It is therefore our recommendation that vast scaling is the primary pre-treatment method to use as this method appears to be more stable and robust across all the different classifiers that were conducted in this study.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据