Skip to main content

How Many Bee Species? A Case Study in Determining the Number of Clusters

  • Conference paper
  • First Online:

Abstract

It is argued that the determination of the best number of clusters k is crucially dependent on the aim of clustering. Existing supposedly “objective” methods of estimating k ignore this. k can be determined by listing a number of requirements for a good clustering in the given application and finding a k that fulfils them all. The approach is illustrated by application to the problem of finding the number of species in a data set of Australasian tetragonula bees. Requirements here include two new statistics formalising the largest within-cluster gap and cluster separation. Due to the typical nature of expert knowledge, it is difficult to make requirements precise, and a number of subjective decisions is involved.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Bowcock, A. M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J. R., & Cavalli-Sforza, L. L. (1994). High resolution of human evolutionary trees with polymorphic microsatellites. Nature, 368, 455–457.

    Article  Google Scholar 

  • Calinski, R. B., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.

    MathSciNet  MATH  Google Scholar 

  • Chaturvedi, A. D., Green, P. E., & Carrol, J. D. (2001). K-modes clustering. Journal of Classification, 18, 35–55.

    MathSciNet  MATH  Google Scholar 

  • Fang, Y., & Wang, J. (2012). Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468–477.

    Article  MathSciNet  MATH  Google Scholar 

  • Franck, P., Cameron, E., Good, G., Rasplus, J.-Y., & Oldroyd, B. P. (2004). Nest architecture and genetic differentiation in a species complex of Australian stingless bees. Molecular Ecology, 13, 2317–2331.

    Article  Google Scholar 

  • Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems 17, 107–145.

    Article  MATH  Google Scholar 

  • Hausdorf, B., & Hennig, C. (2010). Species delimitation using dominant and codominant multilocus markers. Systematic Biology, 59, 491–503.

    Article  Google Scholar 

  • Hennig, C. (2010). Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification, 4, 3–34.

    Article  MathSciNet  Google Scholar 

  • Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.

    Article  Google Scholar 

  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data. New York: Wiley.

    Book  Google Scholar 

  • Morlini, I., & Zani, S. (2012). A new class of weighted similarity indices using polytomous variables. Journal of Classification, 29, 199–226.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Hennig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hennig, C. (2014). How Many Bee Species? A Case Study in Determining the Number of Clusters. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_5

Download citation

Publish with us

Policies and ethics