☆ 4.6 Article

Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

PLOS ONE (2015)

期刊

PLOS ONE

卷 10, 期 5, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0117900

关键词

类别

Multidisciplinary Sciences

资金

National Cancer Institute of the National Institutes of Health [R01CA136535-01, R01CA140772-01, R21CA167811-01, R21CA179327-01, R21CA195152-01]
National Institute of Diabetes and Digestive and Kidney Diseases [R01DK098503-02]
DOD Prostate Cancer Synergistic Idea Development Award [PC120857]
DOD Lung Cancer Idea Development New Investigator Award [LC130463]
DOD Prostate Cancer Idea Development Award
Ohio Third Frontier Technology development Grant
CTSC Coulter Annual Pilot Grant
Case Comprehensive Cancer Center Pilot Grant
VelaSano Grant Cleveland Clinic
Wallace H. Coulter Foundation Program in the Department of Biomedical Engineering at Case Western Reserve University

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets.

Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文