4.6 Article

A Framework for Multiple Imputation in Cluster Analysis

Journal

AMERICAN JOURNAL OF EPIDEMIOLOGY
Volume 177, Issue 7, Pages 718-725

Publisher

OXFORD UNIV PRESS INC
DOI: 10.1093/aje/kws289

Keywords

classification; cluster analysis; imputation; missing data

Funding

  1. Fondo de Investigacion Sanitaria, Ministry of Health, Madrid, Spain [PI020541, PI052486, PI052302, PI060684]
  2. Agencia d'Avaluacio de Tecnologia i Recerca Mediques, Catalonia Government, Barcelona, Spain [035/20/02]
  3. Spanish Society of Pneumology and Thoracic Surgery [2002/137]
  4. Catalan Foundation of Pneumology [2003 Beca Maria Rava]
  5. Red Respira [C03/11]
  6. Red de Centros de Investigacion Cooperativa en Epidemiologia y Salud Publica [C03/09]
  7. Fundacio La Marato de TV3 [041110]
  8. Novartis Farmaceutica, Barcelona, Spain
  9. Instituto de Salud Carlos III, Ministry of Health, Madrid, Spain
  10. Instituto de Salud Carlos III [CP05/00118]

Ask authors/readers for more resources

Multiple imputation is a common technique for dealing with missing values and is mostly applied in regression settings. Its application in cluster analysis problems, where the main objective is to classify individuals into homogenous groups, involves several difficulties which are not well characterized in the current literature. In this paper, we propose a framework for applying multiple imputation to cluster analysis when the original data contain missing values. The proposed framework incorporates the selection of the final number of clusters and a variable reduction procedure, which may be needed in data sets where the ratio of the number of persons to the number of variables is small. We suggest some ways to report how the uncertainty due to multiple imputation of missing data affects the cluster analysis outcomes namely the final number of clusters, the results of a variable selection procedure (if applied), and the assignment of individuals to clusters. The proposed framework is illustrated with data from the Phenotype and Course of Chronic Obstructive Pulmonary Disease (PAC-COPD) Study (Spain, 2004-2008), which aimed to classify patients with chronic obstructive pulmonary disease into different disease subtypes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available