4.1 Article

Extraction of domain concepts from the source code

Journal

SCIENCE OF COMPUTER PROGRAMMING
Volume 98, Issue -, Pages 680-706

Publisher

ELSEVIER
DOI: 10.1016/j.scico.2014.09.012

Keywords

Program understanding; Concept extraction; Domain concept filtering; Concept location; Information retrieval

Ask authors/readers for more resources

Program understanding involves mapping domain concepts to the code elements that implement them. Such mapping is often implicit and undocumented. However, identifier names contain relevant clues to rediscover the mapping and make it available to programmers. In this paper, we present two approaches that exploit structural and linguistic aspects of the source code to extract ontologies. The extracted ontologies are then compared in terms of the concepts they contain and the support they give to program understanding, specifically concept location. Such ontologies are composed of domain and implementation concepts as they come from the source code. To filter domain concepts, we have applied Information Retrieval (IR) based filtering techniques. We have assessed the resulting ontologies against a reference, manually defined, domain ontology. The experimentation was carried out using six real world open source programs. Results show that the ontologies extracted using the structural and linguistic aspects of the source code are complementary. We also observed that their union gives a better support to concept location than the individual ontologies. Filtering the ontologies gives a concise representation of the domain knowledge captured in the source code. The filtered ontologies, however, have been found to be less effective in supporting concept location than the unfiltered ontologies. (C) 2014 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available