A Voronoi Diagram Approach to Autonomous Clustering

Koivistoinen, Heidi; Ruuska, Minna; Elomaa, Tapio

doi:10.1007/11893318_17

Heidi Koivistoinen²¹,
Minna Ruuska²¹ &
Tapio Elomaa²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4265))

Included in the following conference series:

International Conference on Discovery Science

1416 Accesses
6 Citations

Abstract

Clustering is a basic tool in unsupervised machine learning and data mining. Distance-based clustering algorithms rarely have the means to autonomously come up with the correct number of clusters from the data. A recent approach to identifying the natural clusters is to compare the point densities in different parts of the sample space.

In this paper we put forward an agglomerative clustering algorithm which accesses density information by constructing a Voronoi diagram for the input sample. The volumes of the point cells directly reflect the point density in the respective parts of the instance space. Scanning through the input points and their Voronoi cells once, we combine the densest parts of the instance space into clusters.

Our empirical experiments demonstrate the proposed algorithm is able to come up with a high-accuracy clustering for many different types of data. The Voronoi approach clearly outperforms k-means algorithm on data conforming to its underlying assumptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
MacQueen, J.B.: On convergence of k-means and partitions with minimum average variance (abstract). Annals of Mathematical Statistics 36, 1084 (1965)
Article MathSciNet Google Scholar
Forgy, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 21, 768 (1965)
Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons, New York (1973)
MATH Google Scholar
Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Langley, P. (ed.) Proc. 17th International Conference on Machine Learning, San Francisco, CA, pp. 727–734. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Hamerly, G., Elkan, C.: Learning the k in k-means. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16, pp. 281–288. MIT Press, Cambridge (2004)
Google Scholar
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th International Conference on Very Large Data Bases, San Francisco, CA, pp. 144–155. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, New York (1995)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large datasets. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 73–84. ACM Press, New York (1998)
Google Scholar
Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Boutilier, C., Goldszmidt, M. (eds.) Proc. 16th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, pp. 397–405. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Fawcett, T., Mishra, N. (eds.) Proc. 20th International Conference on Machine Learning, pp. 147–153. AAAI Press, Menlo Park (2003)
Google Scholar
Elomaa, T., Koivistoinen, H.: On autonomous k-means clustering. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 228–236. Springer, Heidelberg (2005)
Chapter Google Scholar
Aurenhammer, F., Klein, R.: Voronoi diagrams. In: Sack, J., Urrutia, G. (eds.) Handbook of Computational Geometry, pp. 201–290. North-Holland, Amsterdam (2000)
Chapter Google Scholar
Schreiber, T.: A Voronoi diagram based adaptive k-means-type clustering algorithm for multidimensional weighted data. In: Bieri, H., Noltemeier, H. (eds.) CG-WS 1991. LNCS, vol. 553, pp. 265–275. Springer, Heidelberg (1991)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)
Google Scholar
Xu, X., Ester, M., Kriegel, H.P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: Proc. 14th International Conference on Data Engineering, pp. 324–331. IEEE Computer Society Press, Los Alamitos (1998)
Google Scholar
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowledge and Information Systems 5, 387–415 (2003)
Article Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proc. 45th Annual IEEE Symposium on Foundations on Computer Science, pp. 454–462. IEEE Press, Los Alamitos (2004)
Chapter Google Scholar
Barber, C.B., Dobkin, D.P., Huhdanpaa, H.T.: The Quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software 22, 469–483 (1996)
Article MATH MathSciNet Google Scholar
Aggarwal, A., Guibas, L.J., Saxe, J.B., Shor, P.W.: A linear-time algorithm for computing the voronoi diagram of a convex polygon. Discrete & Computational Geometry 4, 591–604 (1989)
Article MATH MathSciNet Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Software Systems, Tampere University of Technology, P. O. Box 553, FI-33101, Tampere, Finland
Heidi Koivistoinen, Minna Ruuska & Tapio Elomaa

Authors

Heidi Koivistoinen
View author publications
You can also search for this author in PubMed Google Scholar
Minna Ruuska
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski
University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Meme Media Laboratory, Hokkaido University Sapporo, Kita 13, Nishi 8, Kita-ku, P.O. Box, 060-8628, Sapporo, Japan
Klaus P. Jantke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koivistoinen, H., Ruuska, M., Elomaa, T. (2006). A Voronoi Diagram Approach to Autonomous Clustering. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_17

Download citation

DOI: https://doi.org/10.1007/11893318_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46491-4
Online ISBN: 978-3-540-46493-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics