☆ 4.6 Article

Reducing Features to Improve Code Change-Based Bug Prediction

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2013)

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Volume 39, Issue 4, Pages 552-569

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TSE.2012.43

Keywords

Reliability; bug prediction; machine learning; feature selection

Funding

Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [0811865] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Machine learning classifiers have recently emerged as a way to predict the introduction of bugs in changes made to source code files. The classifier is first trained on software history, and then used to predict if an impending change causes a bug. Drawbacks of existing classifier-based bug prediction techniques are insufficient performance for practical use and slow prediction times due to a large number of machine learned features. This paper investigates multiple feature selection techniques that are generally applicable to classification-based bug prediction methods. The techniques discard less important features until optimal classification performance is reached. The total number of features used for training is substantially reduced, often to less than 10 percent of the original. The performance of Naive Bayes and Support Vector Machine (SVM) classifiers when using this technique is characterized on 11 software projects. Naive Bayes using feature selection provides significant improvement in buggy F-measure (21 percent improvement) over prior change classification bug prediction results (by the second and fourth authors [28]). The SVM's improvement in buggy F-measure is 9 percent. Interestingly, an analysis of performance for varying numbers of features shows that strong performance is achieved at even 1 percent of the original number of features.

Reducing Features to Improve Code Change-Based Bug Prediction

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Reducing Features to Improve Code Change-Based Bug Prediction

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper