☆ 4.5 Article

A large empirical assessment of the role of data balancing in machine-learning-based code smell detection

JOURNAL OF SYSTEMS AND SOFTWARE (2020)

期刊

JOURNAL OF SYSTEMS AND SOFTWARE

卷 169, 期 -, 页码 -

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.jss.2020.110693

关键词

Code smells; Machine learning; Data balancing; Object oriented; Model view controller

类别

Computer Science, Software Engineering Computer Science, Theory & Methods

资金

Excellence of Science Project SECO-Assist, Belgium [0015718F]
European Commission [825040]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Code smells can compromise software quality in the long term by inducing technical debt. For this reason, many approaches aimed at identifying these design flaws have been proposed in the last decade. Most of them are based on heuristics in which a set of metrics is used to detect smelly code components. However, these techniques suffer from subjective interpretations, a low agreement between detectors, and threshold dependability. To overcome these limitations, previous work applied Machine-Learning that can learn from previous datasets without needing any threshold definition. However, more recent work has shown that Machine-Learning is not always suitable for code smell detection due to the highly imbalanced nature of the problem. In this study, we investigate five approaches to mitigate data imbalance issues to understand their impact on Machine Learning-based approaches for code smell detection in Object-Oriented systems and those implementing the Model-View-Controller pattern. Our findings show that avoiding balancing does not dramatically impact accuracy. Existing data balancing techniques are inadequate for code smell detection leading to poor accuracy for Machine-Learning-based approaches. Therefore, new metrics to exploit different software characteristics and new techniques to effectively combine them are needed. (C) 2020 Elsevier Inc. All rights reserved.

A large empirical assessment of the role of data balancing in machine-learning-based code smell detection

期刊

JOURNAL OF SYSTEMS AND SOFTWARE

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A large empirical assessment of the role of data balancing in machine-learning-based code smell detection

期刊

JOURNAL OF SYSTEMS AND SOFTWARE

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文