4.7 Article

Text mining of accident reports using semi-supervised keyword extraction and topic modeling

期刊

PROCESS SAFETY AND ENVIRONMENTAL PROTECTION
卷 155, 期 -, 页码 455-465

出版社

ELSEVIER
DOI: 10.1016/j.psep.2021.09.022

关键词

Accidents; Text mining; Document classification; Aviation Safety Reporting System (ASRS); Pipeline and Hazardous Materials Safety; Administration (PHSMA)

向作者/读者索取更多资源

This paper introduces an automated semi-supervised approach for analyzing accident reports, which identifies domain-specific keywords and groups them into topics to achieve data mining purposes. The method demonstrated an average classification accuracy of 80% in two different domain case studies and can generate domain-specific predictive models with limited manual intervention.
Learning from past incidents is critical to achieving and maintaining high process safety performance. Accident and incident records provide one way for learning; however, these are usually in the form of unstructured texts, which makes analysis difficult. Recently, text mining methods based on supervised learning have been proposed for analyzing accident reports; however, they require an impractically large number of labeled records as training examples. This paper proposes an automated, semi-supervised, do-main-independent approach for analyzing accident reports. Given a set of user-defined classification topics and domain literature such as handbooks, glossaries, and Wikipedia articles, the method can identify domain-specific keywords and group them into topics with minimal expert involvement. These keywords and topics can then be used for various data mining purposes, including classification. The proposed approach is demonstrated using two different case studies across domains: (1) in aviation to identify the stage of flight when an accident occurs, and (2) in the process industry domain to identify the cause of pipeline accidents. The average classification accuracy of the proposed method was 80% which is comparable to that of supervised learning methods. The key benefits of this approach are that it can generate domain-specific predictive models with limited manual intervention. (C) 2021 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据