4.5 Article

Presence-Only Data and the EM Algorithm

Journal

BIOMETRICS
Volume 65, Issue 2, Pages 554-563

Publisher

WILEY
DOI: 10.1111/j.1541-0420.2008.01116.x

Keywords

Boosted trees; EM algorithm; Logistic model; Presence-only data; Use-availability data

Funding

  1. NATIONAL INSTITUTE OF BIOMEDICAL IMAGING AND BIOENGINEERING [R01EB001988] Funding Source: NIH RePORTER
  2. NIBIB NIH HHS [R01 EB001988] Funding Source: Medline

Ask authors/readers for more resources

In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation-maximization algorithm to estimate the underlying presence-absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence-absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available