3.8 Proceedings Paper

Automated Hot-spot Identification for Spatial Investigation of Disease Indicators

出版社

IEEE COMPUTER SOC
DOI: 10.1109/BigDataService.2019.00009

关键词

Hot-spot Identification; G(i)*; Variable Selection; Exposonie; Spatial Auto-Correlation; Variable Diffusion

资金

  1. National Science Foundation [1238338]
  2. Directorate For Engineering
  3. Div Of Industrial Innovation & Partnersh [1238338] Funding Source: National Science Foundation

向作者/读者索取更多资源

This paper presents a new procedure that uses spatial statistics to identify clusters of counties having either a high or low incidence of a disease (dependent variable). These counties provide a spatial snapshot that describes the disease in the study area. Using this spatial snapshot as a reference, the procedure evaluates potential factors (independent variables) sorted out by the degree of similarity with the disease when comparing spatial snapshots. The greater the similarity, the greater the likelihood for a causal relationship. Similarity also can facilitate the selection of variables to be considered rather than relying only on the researcher's expertise. In particular, the procedure is used to analyze Cardiovascular Disease at the county level for the contiguous 48 states using the Public Health Exposome, a data repository of environmental factors to which a given group of people may be exposed over the course of their lifetime and that may impact their health. The proposed procedure enables the analysis of a study area with a large number of regions, such as entire countries, but is able to go to the level of detail of a smaller area, such as a county. In contrast, researchers may limit their work to a small number of regions due to computational and analytical limitations. In addition, the procedure yields a ranking of independent variables according to their effect on the dependent variable. In the past Public Health researchers reported that analytical approaches required days of extremely complex statistics and computational time that restricted their analysis to 60 variables. The proposed procedure is run at the Texas Tech High Performance Computing Center taking 12 minutes for 168 variables and a study area with 3,028 regions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据