4.7 Article

A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction

Journal

IEEE TRANSACTIONS ON RELIABILITY
Volume 69, Issue 1, Pages 40-54

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TR.2019.2895462

Keywords

Training; Learning systems; Software; Measurement; NASA; Predictive models; Machine learning; Class-imbalance; cross-project; ensemble learning; software defect prediction (SDP); within-project

Funding

  1. National Natural Science Foundation of China [61673384, 61502497, 61562015]

Ask authors/readers for more resources

Software defect prediction (SDP) is an available way to enhance test efficiency and guarantee software reliability. However, there are more clean instances than defective instances in real software projects, and this results in severe class distribution skews and gets the poor performance of classifiers. So solving the class-imbalance problem in SDP has attracted growing attention from industry and academia in software engineering. In this paper, we propose a novel class-imbalance learning approach for both within-project and cross-project class-imbalance problem. We utilize the thought of stratification embedded in nearest neighbor (STr-NN) to produce evolving training datasets with balanced data. For within-project, we directly employ the STr-NN approach for defect prediction. For cross-project, we first introduce transfer component analysis to mitigate the distribution differences between source and target dataset, and then employ the STr-NN approach on the transferred data. We conduct experiments on PROMISE and NASA datasets using ensemble learning based on weight vote. Experimental results indicate that our approach has higher area under curve (AUC), Recall and comparable probability of a false alarm (pf), and F-measure than some existing methods for the class-imbalance problem.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available