4.4 Article

Tackling the term-mismatch problem in automated trace retrieval

Journal

EMPIRICAL SOFTWARE ENGINEERING
Volume 22, Issue 3, Pages 1103-1142

Publisher

SPRINGER
DOI: 10.1007/s10664-016-9479-8

Keywords

Requirements engineering; Traceability; Query augmentation; Semantic traceability

Funding

  1. US National Science Foundation [CCF-1319680, CCF-0447594]
  2. Direct For Computer & Info Scie & Enginr
  3. Division Of Computer and Network Systems [1649008] Funding Source: National Science Foundation
  4. Direct For Computer & Info Scie & Enginr
  5. Division of Computing and Communication Foundations [1319680, 1649448] Funding Source: National Science Foundation

Ask authors/readers for more resources

Software systems operating in any type of safety or security critical domains must comply with an increasingly large and complex set of regulatory standards. Compliance is partially demonstrated through establishing trace links between requirements and regulatory codes. Such links can be constructed manually or through semi-automated techniques in which the text in the regulatory code is used to formulate an information retrieval query. However, trace retrieval solutions are not effective when significant vocabulary mismatches exist between regulatory codes and product level requirements. This paper describes and compares three query augmentation techniques for addressing the term mismatch problem and improving the quality of trace links generated between regulatory codes and requirements. The first trains a classifier to replace the original query with terms learned from a training set of regulation-to-requirements trace links. The second, replaces the original query with terms learned through web-mining; and the third utilizes a domain ontology to augment query terms. The ontology is constructed manually using a guided approach that leverages existing traceability knowledge. All three techniques were evaluated against security regulations from the USA government's Health Insurance Privacy and Portability Act (HIPAA) traced against ten healthcare related requirements specifications. The classification approach returned the best results; however, improvements were observed with both the classification and ontology based solutions. The web-mining technique showed improvements in only a subset of queries. The three query augmentation techniques offer tradeoffs in terms of performance, cost and effort, and usage viability within a specific project context.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available