4.7 Article

Reducing the effect of sample bias for small data sets with double-weighted support vector transfer regression

Small data sets present a challenge in machine learning, especially in regression scenarios, where a lack of relevant data can lead to biased models. This article proposes a novel regression-based transfer learning model that transfers knowledge from a large, relevant data set to a small data set, reducing bias and improving prediction performance. The proposed approach, DW-SVTR, significantly reduces the impact of small sample bias compared to standard ML methods, as demonstrated through numerical results.
Small data sets are an extremely challenging problem in the machine learning (ML) realm, and in specific, in regression scenarios, as the lack of relevant data can lead to ML models that have large bias. However, there are many applications for which a purely data-driven procedure would be advantageous, but a large amount of data are not available. This article proposes a novel regression-based transfer learning (TL) model to address this challenge, where TL is defined as knowledge transfer from a large, relevant data set (source domain data) to a small data set (target domain data). The proposed TL model is termed double-weighted support vector transfer regression (DW-SVTR), which couples least squares support vector machines for regression (LS-SVMR) with two weight functions. The first weight function uses kernel mean matching (KMM) to reweight the source domain data such that the mean values of the source and target domain data in a reproduced kernel Hilbert space (RKHS) are close. In this way, the source domain data points relevant to the target domain points have a larger weight than irrelevant source domain points. The second weight is a function of estimated residuals, which aims to further reduce the negative interference of irrelevant source domain points. The proposed approach is assessed and validated via simulated data and by enhanced shear strength prediction of nonductile columns based on limited availability of nonductile column data. Specifically, the results for the latter show that the proposed DW-SVTR can reduce the root mean square error (RMSE) by 34% and enhance the coefficient of determination (R-2) by 229%. These numerical results demonstrate that the DW-SVTR significantly reduces the effect of small sample bias and improves prediction performance compared to standard ML methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available