3.8 Proceedings Paper

Automated Hot-spot Identification for Spatial Investigation of Disease Indicators

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/BigDataService.2019.00009

Keywords

Hot-spot Identification; G(i)*; Variable Selection; Exposonie; Spatial Auto-Correlation; Variable Diffusion

Funding

  1. National Science Foundation [1238338]
  2. Directorate For Engineering
  3. Div Of Industrial Innovation & Partnersh [1238338] Funding Source: National Science Foundation

Ask authors/readers for more resources

This paper presents a new procedure that uses spatial statistics to identify clusters of counties having either a high or low incidence of a disease (dependent variable). These counties provide a spatial snapshot that describes the disease in the study area. Using this spatial snapshot as a reference, the procedure evaluates potential factors (independent variables) sorted out by the degree of similarity with the disease when comparing spatial snapshots. The greater the similarity, the greater the likelihood for a causal relationship. Similarity also can facilitate the selection of variables to be considered rather than relying only on the researcher's expertise. In particular, the procedure is used to analyze Cardiovascular Disease at the county level for the contiguous 48 states using the Public Health Exposome, a data repository of environmental factors to which a given group of people may be exposed over the course of their lifetime and that may impact their health. The proposed procedure enables the analysis of a study area with a large number of regions, such as entire countries, but is able to go to the level of detail of a smaller area, such as a county. In contrast, researchers may limit their work to a small number of regions due to computational and analytical limitations. In addition, the procedure yields a ranking of independent variables according to their effect on the dependent variable. In the past Public Health researchers reported that analytical approaches required days of extremely complex statistics and computational time that restricted their analysis to 60 variables. The proposed procedure is run at the Texas Tech High Performance Computing Center taking 12 minutes for 168 variables and a study area with 3,028 regions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available