☆ 4.6 Article

Deep Semantic Feature Learning for Software Defect Prediction

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2020)

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Volume 46, Issue 12, Pages 1267-1293

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TSE.2018.2877612

Keywords

Semantics; Predictive models; Feature extraction; Task analysis; Computer bugs; Data models; Defect prediction; quality assurance; deep learning; semantic features

Funding

Natural Sciences and Engineering Research Council of Canada
National Research Foundation of Korea (NRF) - Korea government (MSIT) [2018R1C1B6001919]
National Research Foundation of Korea [2018R1C1B6001919] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Software defect prediction, which predicts defective code regions, can assist developers in finding bugs and prioritizing their testing efforts. Traditional defect prediction features often fail to capture the semantic differences between different programs. This degrades the performance of the prediction models built on these traditional features. Thus, the capability to capture the semantics in programs is required to build accurate prediction models. To bridge the gap between semantics and defect prediction features, we propose leveraging a powerful representation-learning algorithm, deep learning, to learn the semantic representations of programs automatically from source code files and code changes. Specifically, we leverage a deep belief network (DBN) to automatically learn semantic features using token vectors extracted from the programs' abstract syntax trees (AST) (for file-level defect prediction models) and source code changes (for change-level defect prediction models). We examine the effectiveness of our approach on two file-level defect prediction tasks (i.e., file-level within-project defect prediction and file-level cross-project defect prediction) and two change-level defect prediction tasks (i.e., change-level within-project defect prediction and change-level cross-project defect prediction). Our experimental results indicate that the DBN-based semantic features can significantly improve the examined defect prediction tasks. Specifically, the improvements of semantic features against existing traditional features (in F1) range from 2.1 to 41.9 percentage points for file-level within-project defect prediction, from 1.5 to 13.4 percentage points for file-level cross-project defect prediction, from 1.0 to 8.6 percentage points for change-level within-project defect prediction, and from 0.6 to 9.9 percentage points for change-level cross-project defect prediction.

Deep Semantic Feature Learning for Software Defect Prediction

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Deep Semantic Feature Learning for Software Defect Prediction

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper