☆ 4.6 Article

The Impact of Feature Importance Methods on the Interpretation of Defect Classifiers

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2021)

期刊

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

卷 48, 期 7, 页码 2245-2261

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TSE.2021.3056941

关键词

Software engineering; Computational modeling; Internet; Software quality; Predictive models; Neural networks; Logistics; Model interpretation; model agnostic interpretation; built-in interpretation; feature Importance analysis; variable importance

类别

Computer Science, Software Engineering Engineering, Electrical & Electronic

资金

JSPS KAKENHI Japan [JP18H03222]
JSPS International Joint Research Program
SNSF

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study found that the feature importance ranks computed by classifier specific and classifier agnostic methods are not always consistent for the same dataset and classifier. While classifier agnostic methods show strong agreement for specific data sets and classifiers, commonly used classifier specific methods yield vastly different results.

Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. 2) The computed feature importance ranks by the studied CA methods exhibit a strong agreement including the features reported at top-1 and top-3 ranks for a given dataset and classifier, while even the commonly used CS methods yield vastly different feature importance ranks. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods.

The Impact of Feature Importance Methods on the Interpretation of Defect Classifiers

期刊

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The Impact of Feature Importance Methods on the Interpretation of Defect Classifiers

期刊

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文