Skip to main content
Log in

Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

A biological community usually has a large number of species with relatively small abundances. When a random sample of individuals is selected and each individual is classified according to species identity, some rare species may not be discovered. This paper is concerned with the estimation of Shannon’s index of diversity when the number of species and the species abundances are unknown. The traditional estimator that ignores the missing species underestimates when there is a non-negligible number of unseen species. We provide a different approach based on unequal probability sampling theory because species have different probabilities of being discovered in the sample. No parametric forms are assumed for the species abundances. The proposed estimation procedure combines the Horvitz–Thompson (1952) adjustment for missing species and the concept of sample coverage, which is used to properly estimate the relative abundances of species discovered in the sample. Simulation results show that the proposed estimator works well under various abundance models even when a relatively large fraction of the species is missing. Three real data sets, two from biology and the other one from numismatics, are given for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ashbridge, J. and Goudie, I.B.J. (2000) Coverage-adjusted estimators for mark-recapture in heterogeneous populations. Communications in Statistics-Simulation, 29, 1215–37.

    Google Scholar 

  • Basharin, G.P. (1959) On a statistical estimate for the entropy of a sequence of independent random variables. Theory of Probability and Its Applications, 4, 333–6.

    Google Scholar 

  • Batten, L.A. (1976) Bird communities of some Killarney woodlands. Proceedings of the Royal Irish Academy, 76, 285–313.

    Google Scholar 

  • Bunge, J. and Fitzpatrick, M. (1993) Estimating the number of species: a review. Journal of the American Statistical Association, 88, 364–73.

    Google Scholar 

  • Bunge, J., Fitzpatrick, M., and Handley, J. (1995) Comparison of three estimators of the number of species. Journal of Applied Statistics, 22, 45–59.

    Google Scholar 

  • Chao, A. and Lee, S.-M. (1992) Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87, 210–17.

    Google Scholar 

  • Chao, A., Hwang, W.-H., Chen, Y.-C., and Kuo, C.-Y. (2000) Estimating the number of shared species in two communities. Statistica Sinica, 10, 227–46.

    Google Scholar 

  • Chao, A., Ma, M.-C., and Yang, M.C.K. (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika, 80, 193–201.

    Google Scholar 

  • Colwell, R.K. and Coddington, J.A. (1994) Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society, London B, 345, 101–18.

    Google Scholar 

  • Efron, B. and Tibshirani, R.J. (1993) An Introduction to the Bootstrap, Chapman and Hall, New York.

    Google Scholar 

  • Engen, S. (1978) Stochastic Abundance Models, Halsted Press, New York.

    Google Scholar 

  • Esty, W. (1986) The efficiency of Good's nonparametric coverage estimator. The Annals of Statistics, 14, 1257–60.

    Google Scholar 

  • Good, I.J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–64.

    Google Scholar 

  • Haas, P. and Stokes, L. (1998) Estimating the number of classes in a finite population. Journal of the American Statistical Association, 93, 1475–87.

    Google Scholar 

  • Holst, L. (1981) Some asymptotic results for incomplete multinomial or Poisson samples. Scandinavian Journal of Statistics, 8, 243–6.

    Google Scholar 

  • Horvitz, D.G. and Thompson, D.J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–85.

    Google Scholar 

  • Hutcheson, K. and Shenton, L.R. (1974) Some moments of an estimate of Shannon's measure of information. Communications in Statistics, 3, 89–94.

    Google Scholar 

  • Janzen, D.H. (1973a) Sweep samples of tropical foliage insects: description of study sites, with data on species abundances and size distributions. Ecology, 54, 659–86.

    Google Scholar 

  • Janzen, D.H. (1973b) Sweep samples of tropical foliage insects: effects of seasons, vegetation types, elevation, time of day, and insularity. Ecology, 54, 687–708.

    Google Scholar 

  • MacArthur, R.H. (1957) On the relative abundances of bird species. Proceedings of National Academy of Science, U.S.A., 43, 193–295.

    Google Scholar 

  • Magurran, A.E. (1988) Ecological Diversity and Its Measurement, Princeton, Princeton University Press, New Jersey.

    Google Scholar 

  • Mandelbrot, B. (1977) Fractals, Form, Chance and Dimension, Freeman, San Francisco.

    Google Scholar 

  • Norris III, J.L. and Pollock, K.H. (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environmental and Ecological Statistics, 5, 391–402.

    Google Scholar 

  • Peet, R.K. (1974) The measurement of species diversity. Annual Review of Ecology and Systematics, 5, 285–307.

    Google Scholar 

  • Pielou, E.C. (1975) Ecological Diversity, Wiley, New York.

    Google Scholar 

  • Smith, W. and Grassle, J.F. (1977) Sampling properties of a family of diversity measures. Biometrics, 33, 283–92.

    Google Scholar 

  • Solow, A.R. (1993) A simple test for change in community structure. Journal of Animal Ecology, 62, 191–3.

    Google Scholar 

  • Thompson, S.K. (1992) Sampling, Wiley, New York.

    Google Scholar 

  • Zahl, S. (1977) Jackknifing an index of diversity. Ecology, 58, 907–13.

    Google Scholar 

  • Zipf, G.K. (1965) Human Behavior and Principle of Least Effort, Addison-Wesley, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chao, A., Shen, TJ. Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics 10, 429–443 (2003). https://doi.org/10.1023/A:1026096204727

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026096204727

Navigation