Journal
JOURNAL OF COMPUTING IN CIVIL ENGINEERING
Volume 31, Issue 6, Pages -Publisher
ASCE-AMER SOC CIVIL ENGINEERS
DOI: 10.1061/(ASCE)CP.1943-5487.0000701
Keywords
Heterogeneous data terminology; Data sharing; Semantic interoperability; Semantic relation; Natural language processing; Vector space model; Transportation data
Funding
- National Science Foundation (NSF) [NSF-CIS 420-60-83]
- NSF
Ask authors/readers for more resources
The inconsistency of data terminology has imposed big challenges on integrating transportation project data from distinct sources. Differences in meaning of data elements may lead to miscommunication between data senders and receivers. Semantic relations between terms in digital dictionaries, such as ontologies, can enable the semantics of a data element to be transparent and unambiguous to computer systems. However, because of the lack of effective automated methods, identifying these relations is labor intensive and time consuming. This paper presents a novel integrated methodology that leverages multiple computational techniques to extract heterogeneous American-English data terms used in different highway agencies and their semantic relations from design manuals and other technical specifications. The proposed method implements natural language processing (NLP) to detect data elements from text documents and uses machine learning to determine the semantic relatedness among terms using their occurrence statistics in a corpus. The study also consists of developing an algorithm that classifies semantically related terms into three different lexical groups including synonymy, hyponymy, and meronymy. The key merit in this technique is that the detection of semantic relations uses only linguistic information in texts and does not depend on other existing hand-coded semantic resources. A case study was undertaken that implemented the proposed method on a 16-million-word corpus of roadway design manuals to extract and classify roadway data items. The developed classifier was evaluated using a human-encoded test set, and the results show an overall performance of 92.76% in precision and 81.02% recall. (C) 2017 American Society of Civil Engineers.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available