☆ 4.4 Article

The influence of scaling metabolomics data on model classification accuracy

METABOLOMICS (2015)

期刊

METABOLOMICS

卷 11, 期 3, 页码 684-695

出版社

SPRINGER

DOI: 10.1007/s11306-014-0738-7

关键词

Pre-treatment; Metabolomics; Classification accuracy; Scaling; Chemometrics

类别

Endocrinology & Metabolism

资金

PhastID European project within Seventh Framework Programme for Research and Technological Development [258238]
Engineering and Physical Sciences Research Council [GR/S96685/01] Funding Source: researchfish

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Correctly measured classification accuracy is an important aspect not only to classify pre-designated classes such as disease versus control properly, but also to ensure that the biological question can be answered competently. We recognised that there has been minimal investigation of pre-treatment methods and its influence on classification accuracy within the metabolomics literature. The standard approach to pre-treatment prior to classification modelling often incorporates the use of methods such as autoscaling, which positions all variables on a comparable scale thus allowing one to achieve separation of two or more groups (target classes). This is often undertaken without any prior investigation into the influence of the pre-treatment method on the data and supervised learning techniques employed. Whilst this is useful for deriving essential information such as predictive ability or visual interpretation in many cases, as shown in this study the standard approach is not always the most suitable option available. Here, a study has been conducted to investigate the influence of six pre-treatment methods-autoscaling, range, level, Pareto and vast scaling, as well as no scaling-on four classification models, including: principal components-discriminant function analysis (PC-DFA), support vector machines (SVM), random forests (RF) and k-nearest neighbours (kNN)-using three publically available metabolomics data sets. We have demonstrated that undertaking different pre-treatment methods can greatly affect the interpretation of the statistical modelling outputs. The results have shown that data pre-treatment is context dependent and that there was no single superior method for all the data sets used. Whilst we did find that vast scaling produced the most robust models in terms of classification rate for PC-DFA of both NMR spectroscopy data sets, in general we conclude that both vast scaling and autoscaling produced similar and superior results in comparison to the other four pre-treatment methods on both NMR and GC-MS data sets. It is therefore our recommendation that vast scaling is the primary pre-treatment method to use as this method appears to be more stable and robust across all the different classifiers that were conducted in this study.

The influence of scaling metabolomics data on model classification accuracy

期刊

METABOLOMICS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The influence of scaling metabolomics data on model classification accuracy

期刊

METABOLOMICS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文