4.5 Review

Machine learning techniques for code smell detection: A systematic literature review and meta-analysis

期刊

INFORMATION AND SOFTWARE TECHNOLOGY
卷 108, 期 -, 页码 115-138

出版社

ELSEVIER
DOI: 10.1016/j.infsof.2018.12.009

关键词

Code smells; Machine learning; Systematic literature review

资金

  1. National Natural Science Foundation of China [61432001, 61802374]
  2. CAS-TWAS
  3. Swiss National Science Foundation through the SNF [PP00P2_170529]

向作者/读者索取更多资源

Background: Code smells indicate suboptimal design or implementation choices in the source code that often lead it to be more change- and fault-prone. Researchers defined dozens of code smell detectors, which exploit different sources of information to support developers when diagnosing design flaws. Despite their good accuracy, previous work pointed out three important limitations that might preclude the use of code smell detectors in practice: (i) subjectiveness of developers with respect to code smells detected by such tools, (ii) scarce agreement between different detectors, and (iii) difficulties in finding good thresholds to be used for detection. To overcome these limitations, the use of machine learning techniques represents an ever increasing research area. Objective: While the research community carefully studied the methodologies applied by researchers when defining heuristic-based code smell detectors, there is still a noticeable lack of knowledge on how machine learning approaches have been adopted for code smell detection and whether there are points of improvement to allow a better detection of code smells. Our goal is to provide an overview and discuss the usage of machine learning approaches in the field of code smells. Method: This paper presents a Systematic Literature Review (SLR) on Machine Learning Techniques for Code Smell Detection. Our work considers papers published between 2000 and 2017. Starting from an initial set of 2456 papers, we found that 15 of them actually adopted machine learning approaches. We studied them under four different perspectives: (i) code smells considered, (ii) setup of machine learning approaches, (iii) design of the evaluation strategies, and (iv) a meta-analysis on the performance achieved by the models proposed so far. Results: The analyses performed show that God Class, Long Method, Functional Decomposition, and Spaghetti Code have been heavily considered in the literature. DECISION TREES and SUPPORT VECTOR MACHINES are the most commonly used machine learning algorithms for code smell detection. Models based on a large set of independent variables have performed well. JRIP and RANDOM FOREST are the most effective classifiers in terms of performance. The analyses also reveal the existence of several open issues and challenges that the research community should focus on in the future. Conclusion: Based on our findings, we argue that there is still room for the improvement of machine learning techniques in the context of code smell detection. The open issues emerged in this study can represent the input for researchers interested in developing more powerful techniques.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据