☆ 4.4 Article

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY (2017)

Journal

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY

Volume 32, Issue 6, Pages 1090-1107

Publisher

SCIENCE PRESS

DOI: 10.1007/s11390-017-1785-0

Keywords

software defect prediction; cross-project defect prediction; feature selection; feature clustering; density-based clustering

Funding

National Natural Science Foundation of China [61373012, 91218302, 61321491, 61202006]
Collaborative Innovation Center of Novel Software Technology and Industrialization
Open Project of State Key Laboratory for Novel Software Technology at Nanjing University [KFKT2016B18]
National Basic Research 973 Program of China [2009CB320705]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to compensate the shortage of useful data in the target project, in order to build a meaningful classification model. However, the distribution gap between software features extracted from the source and the target projects may be too large to make the mixed data useful for training. In this paper, we propose a cluster-based novel method FeSCH (Feature Selection Using Clusters of Hybrid-Data) to alleviate the distribution differences by feature selection. FeSCH includes two phases. The feature clustering phase clusters features using a density-based clustering method, and the feature selection phase selects features from each cluster using a ranking strategy. For CPDP, we design three different heuristic ranking strategies in the second phase. To investigate the prediction performance of FeSCH, we design experiments based on real-world software projects, and study the effects of design options in FeSCH (such as ranking strategy, feature selection ratio, and classifiers). The experimental results prove the effectiveness of FeSCH. Firstly, compared with the state-of-the-art baseline methods, FeSCH achieves better performance and its performance is less affected by the classifiers used. Secondly, FeSCH enhances the performance by effectively selecting features across feature categories, and provides guidelines for selecting useful features for defect prediction.

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

Journal

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY

Publisher

SCIENCE PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

Journal

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY

Publisher

SCIENCE PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper