4.6 Article

Multidimensional Population Health Modeling: A Data-Driven Multivariate Statistical Learning Approach

期刊

IEEE ACCESS
卷 10, 期 -, 页码 22737-22755

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3153482

关键词

Statistics; Sociology; Biological system modeling; Aging; Pediatrics; Random forests; Licenses; Data-driven framework; multivariate tree boosting; multidimensional population health; variable selection

向作者/读者索取更多资源

This paper proposes a data-driven multivariate statistical learning approach for studying multidimensional population health, considering the influence of various factors such as health behaviors, clinical care, socioeconomic factors, physical environment, and demographics. The results show that this approach outperforms traditional linear regression and random forest in modeling and predicting population health outcomes. The proposed framework can be used as a decision support tool for accurately assessing and predicting multivariate population health.
Population health is multidimensional in nature, having complex relationships with the various health determinants. However, most previous studies investigate a single dimension of population health using linear models, failing to capture the nonlinearity in the data and interdependence of multiple dimensions in health outcomes. In this paper, we propose a data-driven multivariate statistical learning approach to simultaneously model various aspects of population health-characterizing the length and quality of life-as a function of health behaviors, clinical care, socioeconomic factors, physical environment, and demographics. We also propose a novel percentile-based variable selection for multivariate regression, without compromising the model's generalization performance. We demonstrate the applicability of our proposed data-driven methodological framework using the New York State as a case study. Leveraging cross-validation techniques and statistical hypothesis tests, the results indicate that multivariate tree boosting method outperforms the traditionally-used univariate linear regression model and random forest in modeling multidimensional population health. The variable importance heat-map illustrates the relative influence of the key health determinants on the various dimensions of population health. Partial dependence plots are used to quantify the marginal effects and the nonlinear relationships between the health outcomes and health inputs. Our results show that teen birth rate is strongly associated with both length of life (e.g., child mortality) and quality of life (e.g., physically unhealthy days). Socioeconomic status is the key indicator to predict child and infant mortality. Our proposed framework can be used as a decision support tool for accurately assessing and predicting multivariate population health.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据