☆ 4.7 Article

Cross-validation under separate sampling: strong bias and how to correct it

BIOINFORMATICS (2014)

期刊

BIOINFORMATICS

卷 30, 期 23, 页码 3349-3355

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btu527

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

NSF [CCF-0845407]
NIH (Nutrition, Biostatistics and Bioinformatics) from the National Cancer Institute [2R25CA090301]
Division of Computing and Communication Foundations
Direct For Computer & Info Scie & Enginr [845407] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Motivation: It is commonly assumed in pattern recognition that cross-validation error estimation is 'almost unbiased' as long as the number of folds is not too small. While this is true for random sampling, it is not true with separate sampling, where the populations are independently sampled, which is a common situation in bioinformatics. Results: We demonstrate, via analytical and numerical methods, that classical cross-validation can have strong bias under separate sampling, depending on the difference between the sampling ratios and the true population probabilities. We propose a new separate-sampling cross-validation error estimator, and prove that it satisfies an 'almost unbiased' theorem similar to that of random-sampling cross-validation. We present two case studies with previously published data, which show that the results can change drastically if the correct form of cross-validation is used.

Cross-validation under separate sampling: strong bias and how to correct it

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Cross-validation under separate sampling: strong bias and how to correct it

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文