☆ 3.8 Article

Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression

HEALTH PSYCHOLOGY AND BEHAVIORAL MEDICINE (2021)

Journal

HEALTH PSYCHOLOGY AND BEHAVIORAL MEDICINE

Volume 9, Issue 1, Pages 436-455

Publisher

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD

DOI: 10.1080/21642850.2021.1920416

Keywords

Count data; Poisson regression; negative binomial regression; skewed data; tutorial

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper demonstrates the application of count distribution models to health psychology data, showing better fit compared to traditional regression/linear models, especially for data with a large number of zeros and extreme values. The negative binomial distribution was found to be the best fit for overdispersed data, while both negative binomial and zero-inflated negative binomial models were suitable for data with abundant zeros.

Background: Dependent variables in health psychology are often counts, for example, of a behaviour or number of engagements with an intervention. These counts can be very strongly skewed, and/or contain large numbers of zeros as well as extreme outliers. For example, 'How many cigarettes do you smoke on an average day?' The modal answer may be zero but may range from 0 to 40+. The same can be true for minutes of moderate-to-vigorous physical activity. For some people, this may be near zero, but take on extreme values for someone training for a marathon. Typical analytical strategies for this data involve explicit (or implied) transformations (smoker v. non-smoker, log transformations). However, these data types are 'counts' (i.e. non-negative whole numbers) or quasi-counts (time is ratio but discrete minutes of activity could be analysed as a count), and can be modelled using count distributions - including the Poisson and negative binomial distribution (and their zero-inflated and hurdle extensions, which alloweven more zeros). Methods: In this tutorial paper I demonstrate (in R, Jamovi, and SPSS) the easy application of these models to health psychology data, and their advantages over alternative ways of analysing this type of data using two datasets - one highly dispersed dependent variable (number of views on YouTube, and another with a large number of zeros (number of days on which symptoms were reported over a month). Results: The negative binomial distribution had the best fit for the overdispersed number of views on YouTube. Negative binomial, and zero-inflated negative binomial were both good fits for the symptom data with over-abundant zeros. Conclusions: In both cases, count distributions provided not just a better fit but would lead to different conclusions compared to the poorly fitting traditional regression/linear models.

Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression

Journal

HEALTH PSYCHOLOGY AND BEHAVIORAL MEDICINE

Publisher

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression

Journal

HEALTH PSYCHOLOGY AND BEHAVIORAL MEDICINE

Publisher

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper