☆ 4.3 Article

Outlying observations and missing values: How should they be handled?

CLINICAL AND EXPERIMENTAL PHARMACOLOGY AND PHYSIOLOGY (2008)

Journal

CLINICAL AND EXPERIMENTAL PHARMACOLOGY AND PHYSIOLOGY

Volume 35, Issue 5-6, Pages 670-678

Publisher

BLACKWELL PUBLISHING

DOI: 10.1111/j.1440-1681.2007.04860.x

Keywords

bootstrapping; box-and-whisker plot; data transformation; imputation; percentiles; permutation tests; randomness; robust methods; scatterplot

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

1. The problems of, and best solutions for, outlying observations and missing values are very dependent on the sizes of the experimental groups. For original articles published in Clinical and Experimental Pharmacology and Physiology during 2006-2007, the range of group sizes ranged from three to 44 ('small groups'). In surveys, epidemiological studies and clinical trials, the group sizes range from 100s to 1000s ('large groups'). 2. How can one detect outlying (extreme) observations? The best methods are graphical, for instance: (i) a scatterplot, often with mean +/- 2 s; and (ii) a box-and-whisker plot. Even with these, it is a matter of judgement whether observations are truly outlying. 3. It is permissable to delete or replace outlying observations if an independent explanation for them can be found. This may be, for instance, failure of a piece of measuring equipment or human error in operating it. If the observation is deleted, it can then be treated as a missing value. Rarely, the appropriate portion of the study can be repeated. 4. It is decidedly not permissable to delete unexplained extreme values. Some of the acceptable strategies for handling them are: (i) transform the data and proceed with conventional statistical analyses; (ii) use the mean for location, but use permutation (randomization) tests for comparing means; and (iii) use robust methods for describing location (e.g. median, geometric mean, trimmed mean), for indicating dispersion (range, percentiles), for comparing locations and for regression analysis. 5. What can be done about missing values? Some strategies are: (i) ignore them; (ii) replace them by hand if the data set is small; and (iii) use computerized imputation techniques to replace them if the data set is large (e.g. regression or EM (conditional (E) under bar xpectation, (M) under bar aximum likelihood estimation) methods). 6. If the missing values are ignored, or even if they are replaced, it is essential to test whether the individuals with missing values are otherwise indistinguishable from the remainder of the group. If the missing values have not occurred at random, but are associated with some property of the individuals being studied, the subsequent analysis may be biased.

Outlying observations and missing values: How should they be handled?

Journal

CLINICAL AND EXPERIMENTAL PHARMACOLOGY AND PHYSIOLOGY

Publisher

BLACKWELL PUBLISHING

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Outlying observations and missing values: How should they be handled?

Journal

CLINICAL AND EXPERIMENTAL PHARMACOLOGY AND PHYSIOLOGY

Publisher

BLACKWELL PUBLISHING

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper