4.6 Article

DSSDPP: Data Selection and Sampling Based Domain Programming Predictor for Cross-Project Defect Prediction

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
Volume 49, Issue 4, Pages 1941-1963

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TSE.2022.3204589

Keywords

Data models; Prediction algorithms; Measurement; Predictive models; Tuning; Transfer learning; Programming; Cross-project defect prediction; domain programming predictor; data selection; data sampling; transfer learning; software quality assurance

Ask authors/readers for more resources

Cross-project defect prediction aims to identify defective software modules in a target project using historical data from other projects. However, existing methods often overlook the selection of appropriate source data and suffer from class imbalance issues. This paper proposes a non-parametric data selection and sampling based domain programming predictor (DSSDPP) which overcomes these limitations. DSSDPP learns a discriminative transfer classifier by leveraging the structures of source and target data, achieving better prediction results compared to competing methods in single-source and multi-source scenarios.
Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other projects (i.e., source), which can help developers find defects and prioritize their testing efforts. Unfortunately, there often exists large distribution difference between the source and target data. Most CPDP methods neglect to select the appropriate source data for a given target at the project level. More importantly, existing CPDP models are parametric methods, which usually require intensive parameter selection and tuning to achieve better prediction performance. This would hinder wide applicability of CPDP in practice. Moreover, most CPDP methods do not address the cross-project class imbalance problem. These limitations lead to suboptimal CPDP results. In this paper, we propose a novel data selection and sampling based domain programming predictor (DSSDPP) for CPDP, which addresses the above limitations. DSSDPP is a non-parametric CPDP method, which can perform knowledge transfer across projects without the need for parameter selection and tuning. By exploiting the structures of source and target data, DSSDPP can learn a discriminative transfer classifier for identifying defects of the target project. Extensive experiments on 22 projects from four datasets indicate that DSSDPP achieves better MCC and AUC results against a range of competing methods both in the single-source and multi-source scenarios. Since DSSDPP is easy, effective, extensible, and efficient, we suggest that future work can use it with the well-chosen source data to conduct CPDP especially for the projects with limited computational budget.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available