SAHN Clustering in Arbitrary Metric Spaces Using Heuristic Nearest Neighbor Search

Kriege, Nils; Mutzel, Petra; Schäfer, Till

doi:10.1007/978-3-319-04657-0_11

Nils Kriege¹⁸,
Petra Mutzel¹⁸ &
Till Schäfer¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8344))

Included in the following conference series:

International Workshop on Algorithms and Computation

1153 Accesses
4 Citations

Abstract

Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods that are applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best case running time of $\mathcal{O}(n\log n)$ for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. In extensive experimental evaluations on real-world and synthetic data sets, we compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and running time. The evaluations show a subquadratic running time in practice and a very low memory footprint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Divisive Hierarchical Clustering Algorithm to Find Clusters with Smaller Diameter to Cardinality Ratio

An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio

Article 18 November 2019

Efficiency of random swap clustering

Article Open access 21 March 2018

References

Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. SIGMOD Rec. 28(2), 49–60 (1999)
Article Google Scholar
Breunig, M.M., Kriegel, H.P., Kröger, P., Sander, J.: Data bubbles: quality preserving performance boosting for hierarchical clustering. SIGMOD Rec. 30(2), 79–90 (2001)
Article Google Scholar
Chen, J., MacEachren, A.M., Peuquet, D.J.: Constructing overview + detail dendrogram-matrix views. TVCG 15(6), 889–896 (2009)
Google Scholar
Downs, G.M., Barnard, J.M.: Clustering Methods and Their Uses in Computational Chemistry, pp. 1–40. John Wiley & Sons, Inc., New Jersey (2003)
Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k-means. In: ICML 2003, pp. 147–153. AAAI Press, Menlo Park (2003)
Google Scholar
Eppstein, D.: Fast hierarchical clustering and other applications of dynamic closest pairs. Exp. Algorithmics 5(1) (2000)
Google Scholar
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78(383), 553–569 (1983)
Article MATH Google Scholar
Koga, H., Ishibashi, T., Watanabe, T.: Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems 12(1), 25–53 (2007)
Article Google Scholar
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies 1. hierarchical systems. The Computer Journal 9(4), 373–380 (1967)
Article Google Scholar
Meilă, M.: Comparing clusterings—an information based distance. JMVA 98(5), 873–895 (2007)
MATH Google Scholar
Murtagh, F.: Multidimensional clustering algorithms. In: COMPSTAT Lectures 4. Physica-Verlag, Wuerzburg (1985)
Google Scholar
Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl. Discov. 2(1), 86–97 (2012)
Article MathSciNet Google Scholar
Müllner, D.: Modern hierarchical, agglomerative clustering algorithms, arXiv:1109.2378v1 (2011)
Google Scholar
Nanni, M.: Speeding-up hierarchical agglomerative clustering in presence of expensive metrics. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 378–387. Springer, Heidelberg (2005)
Chapter Google Scholar
Patra, B.K., Hubballi, N., Biswas, S., Nandi, S.: Distance based fast hierarchical clustering method for large datasets. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 50–59. Springer, Heidelberg (2010)
Chapter Google Scholar
Rohlf, F.J.: Hierarchical clustering using the minimum spanning tree. Computer Journal 16, 93–95 (1973)
Google Scholar
Wetzel, S., Klein, K., Renner, S., Rauh, D., Oprea, T.I., Mutzel, P., Waldmann, H.: Interactive exploration of chemical space with Scaffold Hunter. Nature Chemical Biology 5(8), 581–583 (2009)
Article Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. In: Advances in Database Systems, vol. 32. Springer (2006)
Google Scholar
Zhou, J.: Efficiently Searching and Mining Biological Sequence and Structure Data. Ph.D. thesis, University of Alberta (2009)
Google Scholar
Zhou, J., Sander, J.: Speedup clustering with hierarchical ranking. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 1205–1210 (2006)
Google Scholar
Zhou, J., Sander, J.: Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 452–463, VLDB Endowment (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Technische Universität Dortmund, Germany
Nils Kriege, Petra Mutzel & Till Schäfer

Authors

Nils Kriege
View author publications
You can also search for this author in PubMed Google Scholar
Petra Mutzel
View author publications
You can also search for this author in PubMed Google Scholar
Till Schäfer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, 721302, Kharagpur, India
Sudebkumar Prasant Pal
National Institute of Informatics, 2-1-2 Hitotsubashi, 101-8430, Chiyoda-ku, Tokyo, Japan
Kunihiko Sadakane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kriege, N., Mutzel, P., Schäfer, T. (2014). SAHN Clustering in Arbitrary Metric Spaces Using Heuristic Nearest Neighbor Search. In: Pal, S.P., Sadakane, K. (eds) Algorithms and Computation. WALCOM 2014. Lecture Notes in Computer Science, vol 8344. Springer, Cham. https://doi.org/10.1007/978-3-319-04657-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-04657-0_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04656-3
Online ISBN: 978-3-319-04657-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics