☆ 3.8 Proceedings Paper

Online Defect Prediction for Imbalanced Data

2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2 (2015)

期刊

2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2

卷 -, 期 -, 页码 99-108

出版社

IEEE

DOI: 10.1109/ICSE.2015.139

关键词

类别

Computer Science, Software Engineering Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Many defect prediction techniques are proposed to improve software reliability. Change classification predicts defects at the change level, where a change is the modifications to one file in a commit. In this paper, we conduct the first study of applying change classification in practice. We identify two issues in the prediction process, both of which contribute to the low prediction performance. First, the data are imbalanced-there are much fewer buggy changes than clean changes. Second, the commonly used cross-validation approach is inappropriate for evaluating the performance of change classification. To address these challenges, we apply and adapt online change classification, resampling, and updatable classification techniques to improve the classification performance. We perform the improved change classification techniques on one proprietary and six open source projects. Our results show that these techniques improve the precision of change classification by 12.2-89.5% or 6.4-34.8 percentage points (pp.) on the seven projects. In addition, we integrate change classification in the development process of the proprietary project. We have learned the following lessons: 1) new solutions are needed to convince developers to use and believe prediction results, and prediction results need to be actionable, 2) new and improved classification algorithms are needed to explain the prediction results, and insensible and unactionable explanations need to be filtered or refined, and 3) new techniques are needed to improve the relatively low precision.

Online Defect Prediction for Imbalanced Data

期刊

2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Online Defect Prediction for Imbalanced Data

期刊

2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文