☆ 4.7 Article

How large a training set is needed to develop a classifier for microarray data?

CLINICAL CANCER RESEARCH (2008)

期刊

CLINICAL CANCER RESEARCH

卷 14, 期 1, 页码 108-114

出版社

AMER ASSOC CANCER RESEARCH

DOI: 10.1158/1078-0432.CCR-07-0443

关键词

类别

Oncology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Purpose: A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. Experimental Design: We present a model-based approach to determining the sample size required to adequately train a classifier. Results: It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. Conclusion: We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.

How large a training set is needed to develop a classifier for microarray data?

期刊

CLINICAL CANCER RESEARCH

出版社

AMER ASSOC CANCER RESEARCH

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

How large a training set is needed to develop a classifier for microarray data?

期刊

CLINICAL CANCER RESEARCH

出版社

AMER ASSOC CANCER RESEARCH

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文