☆ 4.4 Article

Comparing and experimenting machine learning techniques for code smell detection

EMPIRICAL SOFTWARE ENGINEERING (2016)

期刊

EMPIRICAL SOFTWARE ENGINEERING

卷 21, 期 3, 页码 1143-1191

出版社

SPRINGER

DOI: 10.1007/s10664-015-9378-4

关键词

Code smells detection; Machine learning techniques; Benchmark for code smell detection

类别

Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.

Comparing and experimenting machine learning techniques for code smell detection

期刊

EMPIRICAL SOFTWARE ENGINEERING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Comparing and experimenting machine learning techniques for code smell detection

期刊

EMPIRICAL SOFTWARE ENGINEERING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文