4.5 Article

Generating Realistic Synthetic Population Datasets

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3182383

关键词

Multivariate categorical data; synthetic population; maximum entropy models; probabilistic modeling

资金

  1. Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) [D12PC000337]
  2. National Science Foundation [DGE-1545362, IIS-1633363]
  3. Army Research Laboratory [W911NF-17-1-0021]
  4. Cluster of Excellence 'Multimodal Computing and Interaction' within the Excellence Initiative of the German Federal Government

向作者/读者索取更多资源

Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and US census datasets, and demonstrate its feasibility using an epidemic simulation application.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据