4.7 Review

A structured approach to predictive modeling of a two-class problem using multidimensional data sets

期刊

METHODS
卷 61, 期 1, 页码 73-85

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ymeth.2013.01.002

关键词

Supervised learning; Classification; Data exploration; Machine learning; Data mining

资金

  1. NIAID Clinical Proteomics Center [HHSN272200800048C, NIH-NHLBI-HHSN26820 1000037C]
  2. NHLBI Proteomics Center for Airway Inflammation
  3. National Center for Advancing Translational Sciences, NIH [UL1TR000071]

向作者/读者索取更多资源

Biological experiments in the post-genome era can generate a staggering amount of complex data that challenges experimentalists to extract meaningful information. Increasingly, the success of an appropriately controlled experiment relies on a robust data analysis pipeline. In this paper, we present a structured approach to the analysis of multidimensional data that relies on a close, two-way communication between the bioinformatician and experimentalist. A sequential approach employing data exploration (visualization, graphical and analytical study), pre-processing, feature reduction and supervised classification using machine learning is presented. This standardized approach is illustrated by an example from a proteomic data analysis that has been used to predict the risk of infectious disease outcome. Strategies for model selection and post hoc model diagnostics are presented and applied to the case illustration. We discuss some of the practical lessons we have learned applying supervised classification to multidimensional data sets, one of which is the importance of feature reduction in achieving optimal modeling performance. (C) 2013 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据