4.6 Article

Towards Reliable Online Just-in-Time Software Defect Prediction

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
Volume 49, Issue 3, Pages 1342-1358

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TSE.2022.3175789

Keywords

Software; Reliability; Training; Codes; Software reliability; Software quality; Indexes; Just-in-time software defect prediction; online learning; concept drift; verification latency; class imbalance learning

Ask authors/readers for more resources

Throughout its development, a software project is affected by different phases, modules, and developers, leading to challenges in Just-in-Time Software Defect Prediction (JIT-SDP) due to concept drift and verification latency. This study provides the first detailed analysis of the types and impacts of concept drift on JIT-SDP classifiers. It proposes a new approach to improve the stability and reliability of predictive performance over time.
Throughout its development period, a software project experiences different phases, comprises modules with different complexities and is touched by many different developers. Hence, it is natural that problems such as Just-in-Time Software Defect Prediction (JIT-SDP) are affected by changes in the defect generating process (concept drifts), potentially hindering predictive performance. JIT-SDP also suffers from delays in receiving the labels of training examples (verification latency), potentially exacerbating the challenges posed by concept drift and further hindering predictive performance. However, little is known about what types of concept drift affect JIT-SDP and how they affect JIT-SDP classifiers in view of verification latency. This work performs the first detailed analysis of that. Among others, it reveals that different types of concept drift together with verification latency significantly impair the stability of the predictive performance of existing JIT-SDP approaches, drastically affecting their reliability over time. Based on the findings, a new JIT-SDP approach is proposed, aimed at providing higher and more stable predictive performance (i.e., reliable) over time. Experiments based on ten GitHub open source projects show that our approach was capable of produce significantly more stable predictive performances in all investigated datasets while maintaining or improving the predictive performance obtained by state-of-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available