☆ 4.5 Article

Bug localization using latent Dirichlet allocation

INFORMATION AND SOFTWARE TECHNOLOGY (2010)

Journal

INFORMATION AND SOFTWARE TECHNOLOGY

Volume 52, Issue 9, Pages 972-990

Publisher

ELSEVIER

DOI: 10.1016/j.infsof.2010.04.002

Keywords

Bug localization; Program comprehension; Latent Dirichlet allocation; Information retrieval

Funding

National Science Foundation [CCF-0915403, CCF-0915559]
Division of Computing and Communication Foundations
Direct For Computer & Info Scie & Enginr [0915559] Funding Source: National Science Foundation
Division of Computing and Communication Foundations
Direct For Computer & Info Scie & Enginr [0915403] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Context: Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. Objective: We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. Method: We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. Results: The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. Conclusion: We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable. (C) 2010 Elsevier B.V. All rights reserved.

Bug localization using latent Dirichlet allocation

Journal

INFORMATION AND SOFTWARE TECHNOLOGY

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Bug localization using latent Dirichlet allocation

Journal

INFORMATION AND SOFTWARE TECHNOLOGY

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper