4.7 Article

A comparison of statistical learning methods for deriving determining factors of accident occurrence from an imbalanced high resolution dataset

期刊

ACCIDENT ANALYSIS AND PREVENTION
卷 127, 期 -, 页码 134-149

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.aap.2019.02.008

关键词

Statistical learning; Imbalanced data; Binary classification; Accident analysis; Road safety

向作者/读者索取更多资源

One of the main aims of accident data analysis is to derive the determining factors associated with road traffic accident occurrence. While current studies mainly use variants of count data regression to achieve this aim, the problem can also be considered as a binary classification task, with the dichotomous target variable indicating events (accidents) and non-events (no accidents). The effects of 45 variables - describing road condition and geometry, traffic volume and regulations, weather, and accident time - are analyzed using a dataset in high temporal (1 h) and spatial (250 m) resolution, covering the whole highway network of Austria over the period of four consecutive years. A combination of synthetic minority oversampling and maximum dissimilarity under sampling is used to balance the training dataset. We employ and compare a series of statistical learning techniques with respect to their predictive performance and discuss the importance of determining factors of accident occurrence from the ensemble of models. Findings substantiate that a trade-off between accuracy and sensitivity is inherent to imbalanced classification problems. Results show satisfying performance of tree-based methods which exhibit accuracies between 75% and 90% while exhibiting sensitivities between 30% and 50%. Overall, this analysis emphasizes the merits of using high-resolution data in the context of accident analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据