4.6 Article

Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species

Journal

METHODS IN ECOLOGY AND EVOLUTION
Volume 4, Issue 11, Pages 1091-1100

Publisher

WILEY
DOI: 10.1111/2041-210X.12108

Keywords

diversity; Good-Turing frequency formula; mutual information; sample coverage; Shannon entropy; species accumulation curve; species discovery rate

Categories

Funding

  1. Taiwan National Science Council [100-2118-M007-006-MY3]

Ask authors/readers for more resources

1. Estimating Shannon entropy and its exponential from incomplete samples is a central objective of many research fields. However, empirical estimates of Shannon entropy and its exponential depend strongly on sample size and typically exhibit substantial bias. This work uses a novel method to obtain an accurate, low-bias analytic estimator of entropy, based on species frequency counts. Our estimator does not require prior knowledge of the number of species. 2. We show that there is a close relationship between Shannon entropy and the species accumulation curve, which depicts the cumulative number of observed species as a function of sample size. We reformulate entropy in terms of the expected discovery rates of new species with respect to sample size, that is, the successive slopes of the species accumulation curve. Our estimator is obtained by applying slope estimators derived from an improved Good-Turing frequency formula. Our method is also applied to estimate mutual information. 3. Extensive simulations from theoretical models and real surveys show that if sample size is not unreasonably small, the resulting entropy estimator is nearly unbiased. Our estimator generally outperforms previous methods in terms of bias and accuracy (low mean squared error) especially when species richness is large and there is a large fraction of undetected species in samples. 4. We discuss the extension of our approach to estimate Shannon entropy for multiple incidence data. The use of our estimator in constructing an integrated rarefaction and extrapolation curve of entropy (or mutual information) as a function of sample size or sample coverage (an aspect of sample completeness) is also discussed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available