4.0 Article

Improving IR-based traceability recovery via noun-based indexing of software artifacts

Journal

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS
Volume 25, Issue 7, Pages 743-762

Publisher

WILEY
DOI: 10.1002/smr.1564

Keywords

traceability recovery; vector space model; latent semantic indexing; Jensen-Shannon method; empirical studies

Ask authors/readers for more resources

One of the most successful applications of textual analysis in software engineering is the use of information retrieval (IR) methods to reconstruct traceability links between software artifacts. Unfortunately, because of the limitations of both the humans developing artifacts and the IR techniques any IR-based traceability recovery method fails to retrieve some of the correct links, while on the other hand it also retrieves links that are not correct. This limitation has posed challenges for researchers that have proposed several methods to improve the accuracy of IR-based traceability recovery methods by removing the noise' in the textual content of software artifacts (e.g., by removing common words or increasing the importance of critical terms). In this paper, we propose a heuristic to remove the noise' taking into account the linguistic nature of words in the software artifacts. In particular, the language used in software documents can be classified as a technical language, where the words that provide more indication on the semantics of a document are the nouns. The results of a case study conducted on five software artifact repositories indicate that characterizing the context of software artifacts considering only nouns significantly improves the accuracy of IR-based traceability recovery methods. Copyright (c) 2012 John Wiley & Sons, Ltd.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available