4.6 Article

Tree-Based Machine Learning to Identify and Understand Major Determinants for Stroke at the Neighborhood Level

Journal

Publisher

WILEY
DOI: 10.1161/JAHA.120.016745

Keywords

cardiovascular health; neighborhood; prevention; tree‐ based machine learning; variable selection

Funding

  1. Patient-Centered Outcomes Research Institute [ME2017C3 9041]
  2. National Heart, Lung, and Blood Institute of the National Institutes of Health [R01HL141427]
  3. National Institute on Minority Health and Health Disparities of the National Institutes of Health
  4. National Cancer Institute of the National Institutes of Health [R01MD013886]
  5. [R21CA235153]
  6. [R21CA245855]

Ask authors/readers for more resources

Background Stroke is a major cardiovascular disease that causes significant health and economic burden in the United States. Neighborhood community-based interventions have been shown to be both effective and cost-effective in preventing cardiovascular disease. There is a dearth of robust studies identifying the key determinants of cardiovascular disease and the underlying effect mechanisms at the neighborhood level. We aim to contribute to the evidence base for neighborhood cardiovascular health research. Methods and Results We created a new neighborhood health data set at the census tract level by integrating 4 types of potential predictors, including unhealthy behaviors, prevention measures, sociodemographic factors, and environmental measures from multiple data sources. We used 4 tree-based machine learning techniques to identify the most critical neighborhood-level factors in predicting the neighborhood-level prevalence of stroke, and compared their predictive performance for variable selection. We further quantified the effects of the identified determinants on stroke prevalence using a Bayesian linear regression model. Of the 5 most important predictors identified by our method, higher prevalence of low physical activity, larger share of older adults, higher percentage of non-Hispanic Black people, and higher ozone levels were associated with higher prevalence of stroke at the neighborhood level. Higher median household income was linked to lower prevalence. The most important interaction term showed an exacerbated adverse effect of aging and low physical activity on the neighborhood-level prevalence of stroke. Conclusions Tree-based machine learning provides insights into underlying drivers of neighborhood cardiovascular health by discovering the most important determinants from a wide range of factors in an agnostic, data-driven, and reproducible way. The identified major determinants and the interactive mechanism can be used to prioritize and allocate resources to optimize community-level interventions for stroke prevention.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available