4.7 Article

Interpretable machine learning models for crime prediction

Journal

COMPUTERS ENVIRONMENT AND URBAN SYSTEMS
Volume 94, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.compenvurbsys.2022.101789

Keywords

Crime prediction; Machine learning; XGBoost; Model interpretability; SHAP value

Funding

  1. Natural Science Foundation of Guangdong Province

Ask authors/readers for more resources

The interpretability of advanced machine learning models is utilized in this study to overcome the limitation of estimating variable contributions in crime prediction. Based on routine activity theory and crime pattern theory, 17 variables are selected for crime prediction, and the Shapley additive explanation (SHAP) method is used to discern the contribution of individual variables. Findings reveal that the proportion of the non-local population and the ambient population aged 25-44 contribute the most to crime prediction. Local models provide insights for tackling important factors at each location, while the global model identifies essential factors for the entire region.
The relationship between crime patterns and associated variables has drawn a lot of attention. These variables play a critical role in crime prediction. While traditional regression models are capable of revealing the contribution of the variables, they are not optimal for crime prediction. In contrast, machine learning models are more effective for crime prediction, but most of them cannot estimate the contribution of each individual variable. This study aims to overcome this limitation by taking advantage of the interpretability of advanced machine learning models. Based on the routine activity theory and crime pattern theory, this study selects 17 variables for the crime prediction. The XGBoost algorithm is adopted to train the prediction model. A post-hoc interpretable method, Shapley additive explanation (SHAP), is used to discern the contribution of individual variables. A variable with a higher SHAP value has a higher contribution to the crime prediction model. In addition to the global model for the entire area, a local model is calibrated at each study unit, revealing the spatial variation of the variables' unique contributions. Among all 17 variables used in this model, the proportion of the non-local population and the ambient population aged 25-44 contribute more than other variables in predicting crime. The more the ambient population aged 25-44 in the area, the more the public thefts. Additionally, local SHAP values are mapped to demonstrate each variable's contribution to the crime prediction model across the study area. The results of the local models can help the police tackle the most important factors at each location, while the global model identifies the important factors for the entire region.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available