A New, Fast and Accurate Algorithm for Hierarchical Clustering on Euclidean Distances

Masciari, Elio; Mazzeo, Giuseppe Massimiliano; Zaniolo, Carlo

doi:10.1007/978-3-642-37456-2_10

Elio Masciari²³,
Giuseppe Massimiliano Mazzeo²³ &
Carlo Zaniolo²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7819))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

9698 Accesses
5 Citations

Abstract

A simple hierarchical clustering algorithm called CLUBS (for CLustering Using Binary Splitting) is proposed. CLUBS is faster and more accurate than existing algorithms, including k-means and its recently proposed refinements. The algorithm consists of a divisive phase and an agglomerative phase; during these two phases, the samples are repartitioned using a least quadratic distance criterion possessing unique analytical properties that we exploit to achieve a very fast computation. CLUBS derives good clusters without requiring input from users, and it is robust and impervious to noise, while providing better speed and accuracy than methods, such as BIRCH, that are endowed with the same critical properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. SIGMOD Record 28(2), 49–60 (1999)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA, pp. 1027–1035 (2007)
Google Scholar
Ben-David, S., Ackerman, M.: Measures of clustering quality: A working set of axioms for clustering. In: NIPS, pp. 121–128 (2008)
Google Scholar
Cheung, Y.M.: k*-means: A new generalized k-means clustering algorithm. Pattern Recognition Letters 24(15), 2883–2893 (2003)
Article MATH Google Scholar
Einbond, L.S., Su, T., Wu, H., Friedman, R., Wang, X., Ramirez, A., Kronenberg, F., Weinstein, I.B.: The growth inhibitory effect of actein on human breast cancer cells is associated with activation of stress response pathways. I. J. of Cancer 121(9), 2073–2083 (2007)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Google Scholar
Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast detection of xml structural similarity. TKDE 17(2), 160–175 (2005)
Google Scholar
Graham, K., De Las Morenas, A., Tripathi, A., King, C., Kavanah, M., Mendez, J., Stone, M., Slama, J., Miller, M., Antoine, G., Willers, H., Sebastiani, P., Rosenberg, C.L.: Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. Br. J. Cancer 102(8), 1284–1293 (2010)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2000)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of lloyd-type methods for the k-means problem. In: FOCS (2006)
Google Scholar
Veenman, C.J., Reinders, M.J.T.: The nearest subclass classifier: A compromise between the nearest mean and nearest neighbor classifier. IEEE Trans. Pattern Anal. Mach. Intell. 27(9), 1417–1429 (2005)
Article Google Scholar
Wang, W., Yang, J., Muntz, R.R.: Sting: A statistical information grid approach to spatial data mining. In: VLDB, pp. 186–195 (1997)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: SIGMOD, pp. 103–114 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

ICAR-CNR, Italy
Elio Masciari & Giuseppe Massimiliano Mazzeo
UCLA, USA
Carlo Zaniolo

Authors

Elio Masciari
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Massimiliano Mazzeo
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Zaniolo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Dept. of Computer Science and Information Engineering, Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
Vincent S. Tseng
Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, P.O. Box 123, 2007, Sydney, NSW, Australia
Longbing Cao & Guandong Xu &
Asian Office of Aerospace Research and Development (AOARD), Air Force Office of Scientific Research (AFOSR), Air Force Research Laboratory USA, Osaka University, 7-23-17 Roppongi, 106-0032, Minato-ku, Tokyo, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masciari, E., Mazzeo, G.M., Zaniolo, C. (2013). A New, Fast and Accurate Algorithm for Hierarchical Clustering on Euclidean Distances. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-37456-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37455-5
Online ISBN: 978-3-642-37456-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics