4.6 Article

Mapping Bug Reports to Relevant Source Code Files Based on the Vector Space Model and Word Embedding

Journal

IEEE ACCESS
Volume 7, Issue -, Pages 78870-78881

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2019.2922686

Keywords

Bug localization; information retrieval; surface lexical similarity; semantic similarity; bug report; word embedding

Funding

  1. National Key Research and Development Program [2016YFC0801804, 2016YFC0801405]
  2. National Natural Science Foundation of China [61806067]

Ask authors/readers for more resources

Although software bug localization in software maintenance and evolution is cumbersome and time-consuming, it is also very important, especially for large-scale software projects. To lighten the workload of developers, researchers have developed various information retrieval (IR)-based bug localization models for automated software support. In this paper, we propose a new method that reduces the time required for bug localization. First, the surface lexical similarity between a bug report and source code file is calculated based on the vector space model. Second, to address the lexical gap between the programming language and natural language, the word vector is used to calculate the semantic similarity between the bug report and source code file. Then, we use surface lexical and semantic similarity to calculate the total similarity for detecting buggy source code files. Our experimental word vectors are derived from Skip-gram and GloVe model training. We select an optimal 100 dimensional word vector for bug localization by evaluating it on four open source software examples. Finally, our experimental results show that our method outperforms classical IR-based methods in locating relevant source code files based on several indicators.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available