☆ 4.7 Article

Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports

AUTOMATION IN CONSTRUCTION (2016)

Journal

AUTOMATION IN CONSTRUCTION

Volume 62, Issue -, Pages 45-56

Publisher

ELSEVIER

DOI: 10.1016/j.autcon.2015.11.001

Keywords

Automated content analysis; Natural language processing; Text mining; Knowledge extraction; Accident; Injury; Safety; Attribute; Risk; R

Funding

National Science Foundation through an Early Career Award (CAREER) Program
National Science Foundation [1253179]
Bentley Systems
Directorate For Engineering
Div Of Civil, Mechanical, & Manufact Inn [1253179] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In the United States like in many other countries throughout the globe, construction workers are more likely to be injured on the job than workers in any other industry. This poor safety performance is responsible for huge human and financial losses and has motivated extensive research. Unfortunately, safety improvement in construction has decelerated in the last decade and traditional safety programs have reached saturation. Yet major construction companies and federal agencies possess a wealth of empirical knowledge in the form of huge databases of digital construction injury reports. This knowledge could be used to better understand, predict, and prevent the occurrence of construction accidents. Unfortunately, due to the lack of a clear methodology and the high costs of manual large-scale content analysis, these valuable data have yet to be extracted and leveraged. Recently, researchers have proposed a framework allowing meaningful empirical data to be extracted from accident reports. However, the resource limitations inherent to manual content analysis still remain. The present study tested the proposition that manual content analysis of injury reports can be eliminated using natural language processing (NLP). This paper describes (1) the overall strategy and methodology used in developing the system, and specifically how key challenges with decoding unstructured reports were overcome; (2) how the system was built through an iterative process of coding and testing against manual content analysis results from a team of seven independent analysts; and (3) the implications and potential uses of the data extracted. The results indicate that the NLP system is capable of quickly and automatically scanning unstructured injury reports for 101 attributes and outcomes with over 95% accuracy. The main contribution of this research is to empower any organization to quickly obtain a large and highly reliable structured attribute and outcome data set from their databases of unstructured accident reports. Such structured data are a necessary prerequisite to the application of statistical modeling techniques, allowing the extraction of new safety knowledge and finally the amelioration of safety management. (C) 2015 Elsevier B.V. All rights reserved.

Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports

Journal

AUTOMATION IN CONSTRUCTION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports

Journal

AUTOMATION IN CONSTRUCTION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper