3.8 Proceedings Paper

Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics

Journal

ADVANCED COMPUTING STRATEGIES FOR ENGINEERING, PT II
Volume 10864, Issue -, Pages 130-149

Publisher

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-319-91638-5_7

Keywords

Named entity normalization; Big data analytics; Bridge deterioration prediction

Funding

  1. Strategic Research Initiatives (SRI) Program by the College of Engineering at the University of Illinois at UrbanaChampaign

Ask authors/readers for more resources

The large amount of multi-type and multi-source bridge data open unprecedented opportunities to big data analytics for better bridge deterioration prediction. Information fusion is needed prior to the analytics to transform the heterogeneous data from different sources into a unified representation. Resolving the ambiguities in the named entities extracted from bridge inspection reports is one of the most important fusion tasks. The ambiguity stems from the use of different and ambiguous surface forms to the same target named entity. There is, thus, a need for named entity normalization ( NEN) methods that can map these ambiguous surface forms into their canonical form-an identifier concept. However, existing NEN methods are limited in this regard. This is because they mostly require pre-established knowledge ( e. g., dictionaries or Wikipedia) and/or training data, and mostly ignore the impact of the normalization on data analytics. To address this need, this paper proposes an unsupervised NEN method. It includes two main components: candidate identifier concept generation based on multi-grams of each named entity set, and candidate identifier concept ranking based on a proposed ranking function. The function uses the TF-IDF ( term frequency-inverse document frequency) weight and is further improved by considering the impacts of gram lengths and positions on the ranking. It aims to balance the abstractness and detailedness of the identifier concepts, so as to ensure that the resulting data are neither too dense nor too sparse for the analytics. A set of experiments were conducted to evaluate the performance of the proposed method. It achieved an accuracy of 84.5%.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available