4.6 Article

Integrating Information Entropy and Latent Dirichlet Allocation Models for Analysis of Safety Accidents in the Construction Industry

期刊

BUILDINGS
卷 13, 期 7, 页码 -

出版社

MDPI
DOI: 10.3390/buildings13071831

关键词

text mining; information entropy TF-IDF; latent Dirichlet allocation (LDA); accident investigation report; construction industry

向作者/读者索取更多资源

Construction accident investigation reports are difficult to analyze due to the voluminous Chinese text. To overcome this problem, a novel approach combining text mining techniques and LDA models is proposed to identify the key factors leading to safety accidents in the Chinese construction industry.
Construction accident investigation reports contain critical information, but extracting useful insights from the voluminous Chinese text is challenging. Traditional methods rely on expert judgment, which leads to time-consuming and potentially inaccurate results. To overcome this problem, we propose a novel approach that combines text mining techniques and latent Dirichlet allocation (LDA) models to analyze standardized accident investigation reports in the Chinese construction industry. The proposed method integrates an information entropy term frequency-inverse document frequency (TF-IDF) weighting scheme to evaluate term importance and accounts for word and model uncertainty. The method was applied to a set of construction industry accident reports to identify the key factors leading to safety accidents. The results show that the causal factors of accidents in Chinese accident investigation reports consist of keywords and negative expressions, including failure to timely identify safety hazards and inadequate site safety management. Failure to timely identify safety hazards is the most common factor in accident investigation reports, and the negative expressions commonly used in the reports include not timely and not in place. The information entropy TF-IDF method is superior to traditional methods in terms of accuracy and efficiency, and the LDA model that considers word frequency and feature weights is better able to capture the underlying themes in the Chinese corpus. And the subject terms that make up the themes contain more information about the causes of accidents. This approach helps site managers more quickly and effectively understand the causal factors and key messages that lead to accidents from incident reports. It gives site managers insight into common patterns and themes associated with safety incidents, such as unsafe practices, hazardous work environments, and non-compliance with safety regulations. This enables them to make informed decisions to improve safety management practices.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据