☆ 4.6 Article

NLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology

JOURNAL OF COMPUTING IN CIVIL ENGINEERING (2017)

Journal

JOURNAL OF COMPUTING IN CIVIL ENGINEERING

Volume 31, Issue 6, Pages -

Publisher

ASCE-AMER SOC CIVIL ENGINEERS

DOI: 10.1061/(ASCE)CP.1943-5487.0000701

Keywords

Heterogeneous data terminology; Data sharing; Semantic interoperability; Semantic relation; Natural language processing; Vector space model; Transportation data

Funding

National Science Foundation (NSF) [NSF-CIS 420-60-83]
NSF

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The inconsistency of data terminology has imposed big challenges on integrating transportation project data from distinct sources. Differences in meaning of data elements may lead to miscommunication between data senders and receivers. Semantic relations between terms in digital dictionaries, such as ontologies, can enable the semantics of a data element to be transparent and unambiguous to computer systems. However, because of the lack of effective automated methods, identifying these relations is labor intensive and time consuming. This paper presents a novel integrated methodology that leverages multiple computational techniques to extract heterogeneous American-English data terms used in different highway agencies and their semantic relations from design manuals and other technical specifications. The proposed method implements natural language processing (NLP) to detect data elements from text documents and uses machine learning to determine the semantic relatedness among terms using their occurrence statistics in a corpus. The study also consists of developing an algorithm that classifies semantically related terms into three different lexical groups including synonymy, hyponymy, and meronymy. The key merit in this technique is that the detection of semantic relations uses only linguistic information in texts and does not depend on other existing hand-coded semantic resources. A case study was undertaken that implemented the proposed method on a 16-million-word corpus of roadway design manuals to extract and classify roadway data items. The developed classifier was evaluated using a human-encoded test set, and the results show an overall performance of 92.76% in precision and 81.02% recall. (C) 2017 American Society of Civil Engineers.

NLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology

Journal

JOURNAL OF COMPUTING IN CIVIL ENGINEERING

Publisher

ASCE-AMER SOC CIVIL ENGINEERS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

NLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology

Journal

JOURNAL OF COMPUTING IN CIVIL ENGINEERING

Publisher

ASCE-AMER SOC CIVIL ENGINEERS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper