Skip to main content

Efficient Hierarchical Clustering Algorithms Using Partially Overlapping Partitions

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Abstract

Clustering is an important data exploration task. A prominent clustering algorithm is agglomerative hierarchical clustering. Roughly, in each iteration, it merges the closest pair of clusters. It was first proposed way back in 1951, and since then there have been numerous modifications. Some of its good features are: a natural, simple, and non-parametric grouping of similar objects which is capable of finding clusters of different shape such as spherical and arbitrary. But large CPU time and high memory requirement limit its use for large data. In this paper we show that geometric metric (centroid, median, and minimum variance) algorithms obey a 90-10 relationship where roughly the first 90iterations are spent on merging clusters with distance less than 10the maximum merging distance. This characteristic is exploited by partially overlapping partitioning. It is shown with experiments and analyses that different types of existing algorithms benefit excellently by drastically reducing CPU time and memory. Other contributions of this paper include comparison study of multi-dimensional vis-a-vis single-dimensional partitioning, and analytical and experimental discussions on setting of parameters such as number of partitions and dimensions for partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M.R. Anderberg. Cluster Analysis for Applications. Academic Press, NY, 1973.

    MATH  Google Scholar 

  2. W.H.E. Day and H. Edlesbrunner. Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1):7–24, 1984.

    Article  MATH  Google Scholar 

  3. W. DuMouchel, C. Volinsky, T. Johnson, C. Cortes, and D. Pregibon. Squashing at files atter. In Proceedings of KDD’99, pages 6–15, 1999.

    Google Scholar 

  4. M. Ester, H.P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of KDD’96, pages 226–231, 1996.

    Google Scholar 

  5. U. Fayyad, C. Reina, and P.S. Bradley. Initialization of iterative refinement clustering algorithms. In Proceedings of KDD’98, pages 194–198, 1998.

    Google Scholar 

  6. K. Florek, J. Lukaszewicz, J. Perkal, H. Steinhaus, and S. Zubrzycki. Sur la liason et la division des points d’un ensemble_ni. Colloq. Math., 2:282–285, 1951.

    Google Scholar 

  7. S. Guha, R. Rastogi, and S. Kyuseok. ROCK: A robust clustering algorithm for categorical attributes. In Proceedings of ICDE’99, pages 512–521, 1999.

    Google Scholar 

  8. S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In Proceedings of ACM SIGMOD’98, pages 73–84, 1998.

    Google Scholar 

  9. A.K. Jain and R.C. Dubes. Algorithm for Clustering Data, chapter Clustering Methods and Algorithms. Prentice-Hall Advanced Reference Series, 1988.

    Google Scholar 

  10. G. Karypis, E-H. Han, and V. Kumar. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32:68–75, 1999.

    Google Scholar 

  11. F. Murtagh. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26:354–359, 1983.

    MATH  Google Scholar 

  12. C.F. Olson. Parallel algorithms for hierarchical clustering. Parallel Computing, 21:1313–1325, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  13. M.O. Rabin. Probabilistic algorithms. In J.F. Traub, editor, Algorithms and Complexity, pages 21–39. Academic Press, New York, 1976.

    Google Scholar 

  14. F.J. Rohlf. Computation efficiency of agglomerative clustering algorithms. Technical Report Report RC 6831, IBM T.J. Watson Research Center, NY, 1977.

    Google Scholar 

  15. M. Stonebraker, J. Frew, K. Gardels, and J. Meredith. The SEQUOIA 2000 storage benchmark. In Proceedings of ACM SIGMOD, pages 2–11, 1993.

    Google Scholar 

  16. Jr. J.H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58:236–244, 1963.

    Article  MathSciNet  Google Scholar 

  17. G. Yuval. Finding nearest neighbors. Information Processing Letters, 5:63–65, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  18. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In Proceedings of ACM SIGMOD pages 103–114, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dash, M., Liu, H. (2001). Efficient Hierarchical Clustering Algorithms Using Partially Overlapping Partitions. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_52

Download citation

  • DOI: https://doi.org/10.1007/3-540-45357-1_52

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41910-5

  • Online ISBN: 978-3-540-45357-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics