4.5 Article

Sparse data bias: a problem hiding in plain sight

Journal

BMJ-BRITISH MEDICAL JOURNAL
Volume 353, Issue -, Pages -

Publisher

BMJ PUBLISHING GROUP
DOI: 10.1136/bmj.i1981

Keywords

-

Ask authors/readers for more resources

Effects of treatment or other exposure on outcome events are commonly measured by ratios of risks, rates, or odds. Adjusted versions of these measures are usually estimated by maximum likelihood regression (eg, logistic, Poisson, or Cox modelling). But resulting estimates of effect measures can have serious bias when the data lack adequate case numbers for some combination of exposure and outcome levels. This bias can occur even in quite large datasets and is hence often termed sparse data bias. The bias can arise or be worsened by regression adjustment for potentially confounding variables; in the extreme, the resulting estimates could be impossibly huge or even infinite values that are meaningless artefacts of data sparsity. Such estimate inflation might be obvious in light of background information, but is rarely noted let alone accounted for in research reports. We outline simple methods for detecting and dealing with the problem focusing especially on penalised estimation, which can be easily performed with common software packages.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available