期刊
TECHNOMETRICS
卷 64, 期 2, 页码 166-176出版社
TAYLOR & FRANCIS INC
DOI: 10.1080/00401706.2021.1921037
关键词
Cross-validation; Quasi-Monte Carlo; Testing; Training; Validation
资金
- U.S. National Science Foundation [CBET-1921873]
In this article, an optimal method named SPlit for splitting a dataset into training and testing sets is proposed, which is based on the support points algorithm and can be applied to both regression and classification problems. The implementation on real datasets shows substantial improvement compared to the commonly used random splitting procedure.
In this article, we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of support points (SP), which was initially developed for finding the optimal representative points of a continuous distribution. We adapt SP for subsampling from a dataset using a sequential nearest neighbor algorithm. We also extend SP to deal with categorical variables so that SPlit can be applied to both regression and classification problems. The implementation of SPlit on real datasets shows substantial improvement in the worst-case testing performance for several modeling methods compared to the commonly used random splitting procedure.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据