4.3 Article

Modelling skewed data with many zeros: A simple approach combining ordinary and logistic regression

Journal

ENVIRONMENTAL AND ECOLOGICAL STATISTICS
Volume 12, Issue 1, Pages 45-54

Publisher

SPRINGER
DOI: 10.1007/s10651-005-6817-1

Keywords

abundance; bootstrap; conditional model; evechinus; ecklonia

Ask authors/readers for more resources

We discuss a method for analyzing data that are positively skewed and contain a substantial proportion of zeros. Such data commonly arise in ecological applications, when the focus is on the abundance of a species. The form of the distribution is then due to the patchy nature of the environment and/or the inherent heterogeneity of the species. The method can be used whenever we wish to model the data as a response variable in terms of one or more explanatory variables. The analysis consists of three stages. The first involves creating two sets of data from the original: one shows whether or not the species is present; the other indicates the logarithm of the abundance when it is present. These are referred to as the 'presence data' and the 'log-abundance' data, respectively. The second stage involves modelling the presence data using logistic regression, and separately modelling the log-abundance data using ordinary regression. Finally, the third stage involves combining the two models in order to estimate the expected abundance for a specific set of values of the explanatory variables. A common approach to analyzing this sort of data is to use a In (y + c) transformation, where c is some constant (usually one). The method we use here avoids the need for an arbitrary choice of the value of c, and allows the modelling to be carried out in a natural and straightforward manner, using well-known regression techniques. The approach we put forward is not original, having been used in both conservation biology and fisheries. Our objectives in this paper are to (a) promote the application of this approach in a wide range of settings and (b) suggest that parametric bootstrapping be used to provide confidence limits for the estimate of expected abundance. (c) 2005 Springer Science + Business Media, Inc.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available