4.6 Article

Mapping Bug Reports to Relevant Source Code Files Based on the Vector Space Model and Word Embedding

期刊

IEEE ACCESS
卷 7, 期 -, 页码 78870-78881

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2019.2922686

关键词

Bug localization; information retrieval; surface lexical similarity; semantic similarity; bug report; word embedding

资金

  1. National Key Research and Development Program [2016YFC0801804, 2016YFC0801405]
  2. National Natural Science Foundation of China [61806067]

向作者/读者索取更多资源

Although software bug localization in software maintenance and evolution is cumbersome and time-consuming, it is also very important, especially for large-scale software projects. To lighten the workload of developers, researchers have developed various information retrieval (IR)-based bug localization models for automated software support. In this paper, we propose a new method that reduces the time required for bug localization. First, the surface lexical similarity between a bug report and source code file is calculated based on the vector space model. Second, to address the lexical gap between the programming language and natural language, the word vector is used to calculate the semantic similarity between the bug report and source code file. Then, we use surface lexical and semantic similarity to calculate the total similarity for detecting buggy source code files. Our experimental word vectors are derived from Skip-gram and GloVe model training. We select an optimal 100 dimensional word vector for bug localization by evaluating it on four open source software examples. Finally, our experimental results show that our method outperforms classical IR-based methods in locating relevant source code files based on several indicators.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据