☆ 4.7 Article Proceedings Paper

Cross-study validation for the assessment of prediction algorithms

BIOINFORMATICS (2014)

Journal

BIOINFORMATICS

Volume 30, Issue 12, Pages 105-112

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btu279

Keywords

Funding

German Science Foundation [BO3139/2-2]
National Science Foundation [CAREER DBI-1053486, DMS-1042785]
National Cancer Institute [5P30 CA006516-46, 1RC4 CA156551-01]
Direct For Biological Sciences
Div Of Biological Infrastructure [1053486] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to 'cross-study validation', to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation.

Cross-study validation for the assessment of prediction algorithms

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Cross-study validation for the assessment of prediction algorithms

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper