Skip to main content

A Voronoi Diagram Approach to Autonomous Clustering

  • Conference paper
Book cover Discovery Science (DS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4265))

Included in the following conference series:

Abstract

Clustering is a basic tool in unsupervised machine learning and data mining. Distance-based clustering algorithms rarely have the means to autonomously come up with the correct number of clusters from the data. A recent approach to identifying the natural clusters is to compare the point densities in different parts of the sample space.

In this paper we put forward an agglomerative clustering algorithm which accesses density information by constructing a Voronoi diagram for the input sample. The volumes of the point cells directly reflect the point density in the respective parts of the instance space. Scanning through the input points and their Voronoi cells once, we combine the densest parts of the instance space into clusters.

Our empirical experiments demonstrate the proposed algorithm is able to come up with a high-accuracy clustering for many different types of data. The Voronoi approach clearly outperforms k-means algorithm on data conforming to its underlying assumptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)

    Article  Google Scholar 

  2. MacQueen, J.B.: On convergence of k-means and partitions with minimum average variance (abstract). Annals of Mathematical Statistics 36, 1084 (1965)

    Article  MathSciNet  Google Scholar 

  3. Forgy, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 21, 768 (1965)

    Google Scholar 

  4. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons, New York (1973)

    MATH  Google Scholar 

  5. Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Langley, P. (ed.) Proc. 17th International Conference on Machine Learning, San Francisco, CA, pp. 727–734. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  6. Hamerly, G., Elkan, C.: Learning the k in k-means. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16, pp. 281–288. MIT Press, Cambridge (2004)

    Google Scholar 

  7. Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th International Conference on Very Large Data Bases, San Francisco, CA, pp. 144–155. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  8. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, New York (1995)

    Google Scholar 

  9. Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large datasets. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 73–84. ACM Press, New York (1998)

    Google Scholar 

  10. Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Boutilier, C., Goldszmidt, M. (eds.) Proc. 16th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, pp. 397–405. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  11. Elkan, C.: Using the triangle inequality to accelerate k-means. In: Fawcett, T., Mishra, N. (eds.) Proc. 20th International Conference on Machine Learning, pp. 147–153. AAAI Press, Menlo Park (2003)

    Google Scholar 

  12. Elomaa, T., Koivistoinen, H.: On autonomous k-means clustering. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 228–236. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Aurenhammer, F., Klein, R.: Voronoi diagrams. In: Sack, J., Urrutia, G. (eds.) Handbook of Computational Geometry, pp. 201–290. North-Holland, Amsterdam (2000)

    Chapter  Google Scholar 

  14. Schreiber, T.: A Voronoi diagram based adaptive k-means-type clustering algorithm for multidimensional weighted data. In: Bieri, H., Noltemeier, H. (eds.) CG-WS 1991. LNCS, vol. 553, pp. 265–275. Springer, Heidelberg (1991)

    Google Scholar 

  15. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)

    Google Scholar 

  16. Xu, X., Ester, M., Kriegel, H.P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: Proc. 14th International Conference on Data Engineering, pp. 324–331. IEEE Computer Society Press, Los Alamitos (1998)

    Google Scholar 

  17. Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowledge and Information Systems 5, 387–415 (2003)

    Article  Google Scholar 

  18. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proc. 45th Annual IEEE Symposium on Foundations on Computer Science, pp. 454–462. IEEE Press, Los Alamitos (2004)

    Chapter  Google Scholar 

  19. Barber, C.B., Dobkin, D.P., Huhdanpaa, H.T.: The Quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software 22, 469–483 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  20. Aggarwal, A., Guibas, L.J., Saxe, J.B., Shor, P.W.: A linear-time algorithm for computing the voronoi diagram of a convex polygon. Discrete & Computational Geometry 4, 591–604 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  21. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Koivistoinen, H., Ruuska, M., Elomaa, T. (2006). A Voronoi Diagram Approach to Autonomous Clustering. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_17

Download citation

  • DOI: https://doi.org/10.1007/11893318_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46491-4

  • Online ISBN: 978-3-540-46493-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics