4.6 Article

Model Selection Criteria for Missing-Data Problems Using the EM Algorithm

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
卷 103, 期 484, 页码 1648-1658

出版社

AMER STATISTICAL ASSOC
DOI: 10.1198/016214508000001057

关键词

EM algorithm; H-function; Kullback-Leibler divergence; Missing data; Q-function

资金

  1. National Science Foundation [SES-06-43663, BCS-0826844] Funding Source: Medline
  2. NCI NIH HHS [R01 CA074015, R01 CA074015-10] Funding Source: Medline
  3. NCRR NIH HHS [UL1 RR025747] Funding Source: Medline
  4. NIGMS NIH HHS [R01 GM070335-12, R01 GM070335] Funding Source: Medline

向作者/读者索取更多资源

We consider novel methods for the Computation of model selection criteria in missing-data problems based on the output of the EM algorithm The methodology is very general and can be applied to numerous simulations involving incomplete data within an EM framework, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Toward this goal, we develop a class of information criteria for missing-data problems called ICH,Q, which yields the Akaike information criterion and the Bayesian information criterion as special cases. The computation of ICH,Q requires an analytic approximation to a complicated function. called the H-function, along with output from the EM algorithm used in obtaining maximum likelihood estimates. The approximation to the H-function leads to a large class of information criteria, called IC(H) over tilde (k),Q. Theoretical properties of IC(H) over tilde (k),Q, including consistency, are investigated in detail. To eliminate the analytic approximation to the H-function, a computationally simpler approximation to ICH,Q. called ICQ, is proposed, the computation of which depends solely on the Q-function of the EM algorithm. Advantages and disadvantages of IC(H) over tilde (k),Q and ICQ are discussed and examined in detail in the context of missing-data problems. Extensive simulations are given to demonstrate the methodology and examine the small-sample and large-sample performance of IC(H) over tilde (k),Q and ICQ in missing-data problems. An AIDS data set also is presented to illustrate the proposed methodology.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据