☆ 4.7 Article

Reducing the effect of sample bias for small data sets with double-weighted support vector transfer regression

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING (2021)

Journal

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING

Volume 36, Issue 3, Pages 248-263

Publisher

WILEY

DOI: 10.1111/mice.12617

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Small data sets present a challenge in machine learning, especially in regression scenarios, where a lack of relevant data can lead to biased models. This article proposes a novel regression-based transfer learning model that transfers knowledge from a large, relevant data set to a small data set, reducing bias and improving prediction performance. The proposed approach, DW-SVTR, significantly reduces the impact of small sample bias compared to standard ML methods, as demonstrated through numerical results.

Small data sets are an extremely challenging problem in the machine learning (ML) realm, and in specific, in regression scenarios, as the lack of relevant data can lead to ML models that have large bias. However, there are many applications for which a purely data-driven procedure would be advantageous, but a large amount of data are not available. This article proposes a novel regression-based transfer learning (TL) model to address this challenge, where TL is defined as knowledge transfer from a large, relevant data set (source domain data) to a small data set (target domain data). The proposed TL model is termed double-weighted support vector transfer regression (DW-SVTR), which couples least squares support vector machines for regression (LS-SVMR) with two weight functions. The first weight function uses kernel mean matching (KMM) to reweight the source domain data such that the mean values of the source and target domain data in a reproduced kernel Hilbert space (RKHS) are close. In this way, the source domain data points relevant to the target domain points have a larger weight than irrelevant source domain points. The second weight is a function of estimated residuals, which aims to further reduce the negative interference of irrelevant source domain points. The proposed approach is assessed and validated via simulated data and by enhanced shear strength prediction of nonductile columns based on limited availability of nonductile column data. Specifically, the results for the latter show that the proposed DW-SVTR can reduce the root mean square error (RMSE) by 34% and enhance the coefficient of determination (R-2) by 229%. These numerical results demonstrate that the DW-SVTR significantly reduces the effect of small sample bias and improves prediction performance compared to standard ML methods.

Reducing the effect of sample bias for small data sets with double-weighted support vector transfer regression

Journal

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Reducing the effect of sample bias for small data sets with double-weighted support vector transfer regression

Journal

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper