4.0 Article

Named entity recognition for Chinese construction documents based on conditional random field

Journal

FRONTIERS OF ENGINEERING MANAGEMENT
Volume 10, Issue 2, Pages 237-249

Publisher

HIGHER EDUCATION PRESS
DOI: 10.1007/s42524-021-0179-8

Keywords

NER; NLP; Chinese language; construction document

Ask authors/readers for more resources

Named Entity Recognition (NER) plays a crucial role in construction management. This study introduces a NER method for Chinese construction documents based on Conditional Random Field (CRF). By utilizing a corpus design pipeline and a CRF model, this method successfully identifies named entities and improves construction management efficiency.
Named entity recognition (NER) is essential in many natural language processing (NLP) tasks such as information extraction and document classification. A construction document usually contains critical named entities, and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency. This study presents a NER method for Chinese construction documents based on conditional random field (CRF), including a corpus design pipeline and a CRF model. The corpus design pipeline identifies typical NER tasks in construction management, enables word-based tokenization, and controls the annotation consistency with a newly designed annotating specification. The CRF model engineers nine transformation features and seven classes of state features, covering the impacts of word position, part-of-speech (POS), and word/character states within the context. The F1-measure on a labeled construction data set is 87.9%. Furthermore, as more domain knowledge features are infused, the marginal performance improvement of including POS information will decrease, leading to a promising research direction of POS customization to improve NLP performance with limited data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available