4.5 Article

SPlit: An Optimal Method for Data Splitting

期刊

TECHNOMETRICS
卷 64, 期 2, 页码 166-176

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/00401706.2021.1921037

关键词

Cross-validation; Quasi-Monte Carlo; Testing; Training; Validation

资金

  1. U.S. National Science Foundation [CBET-1921873]

向作者/读者索取更多资源

In this article, an optimal method named SPlit for splitting a dataset into training and testing sets is proposed, which is based on the support points algorithm and can be applied to both regression and classification problems. The implementation on real datasets shows substantial improvement compared to the commonly used random splitting procedure.
In this article, we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of support points (SP), which was initially developed for finding the optimal representative points of a continuous distribution. We adapt SP for subsampling from a dataset using a sequential nearest neighbor algorithm. We also extend SP to deal with categorical variables so that SPlit can be applied to both regression and classification problems. The implementation of SPlit on real datasets shows substantial improvement in the worst-case testing performance for several modeling methods compared to the commonly used random splitting procedure.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据