☆ 3.8 Proceedings Paper

Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction

2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) (2017)

Journal

2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME)

Volume -, Issue -, Pages 159-170

Publisher

IEEE

DOI: 10.1109/ICSME.2017.51

Keywords

Change Classification; Cost Effectiveness; Evaluation; Bias

Funding

NSFC Program [61602403, 61572426]
National Key Technology R&D Program of the Ministry of Science and Technology of China [2015BAH17F01]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Effort-aware just-in-time (JIT) defect prediction aims at finding more defective software changes with limited code inspection cost. Traditionally, supervised models have been used; however, they require sufficient labelled training data, which is difficult to obtain, especially for new projects. Recently, Yang et al. proposed an unsupervised model (LT) and applied it to projects with rich historical bug data. Interestingly, they reported that, under the same inspection cost (i.e., 20 percent of the total lines of code modified by all changes), it could find more defective changes than a state-of-the-art supervised model (i.e., EALR). This is surprising as supervised models that benefit from historical data are expected to perform better than unsupervised ones. Their finding suggests that previous studies on defect prediction had made a simple problem too complex. Considering the potential high impact of Yang et al.'s work, in this paper, we perform a replication study and present the following new findings: (1) Under the same inspection budget, LT requires developers to inspect a large number of changes necessitating many more context switches. (2) Although LT finds more defective changes, many highly ranked changes are false alarms. These initial false alarms may negatively impact practitioners' patience and confidence. (3) LT does not outperform EALR when the harmonic mean of Recall and Precision (i.e., F1-score) is considered. Aside from highlighting the above findings, we propose a simple but improved supervised model called CBS. When compared with EALR, CBS detects about 15% more defective changes and also significantly improves Precision and F1-score. When compared with LT, CBS achieves similar results in terms of Recall, but it significantly reduces context switches and false alarms before first success. Finally, we also discuss the implications of our findings for practitioners and researchers.

Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction

Journal

2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME)

Publisher

IEEE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction

Journal

2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME)

Publisher

IEEE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper