RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Samatova, Nagiza F.; Ostrouchov, George; Geist, Al; Melechko, Anatoli V.

doi:10.1023/A:1013988102576

RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Published: March 2002

Volume 11, pages 157–180, (2002)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Nagiza F. Samatova¹,
George Ostrouchov¹,
Al Geist¹ &
…
Anatoli V. Melechko²

175 Accesses
26 Citations
Explore all metrics

Abstract

This paper presents a hierarchical clustering method named RACHET (Recursive Agglomeration of Clustering Hierarchies by Encircling Tactic) for analyzing multi-dimensional distributed data. A typical clustering algorithm requires bringing all the data in a centralized warehouse. This results in O(nd) transmission cost, where n is the number of data points and d is the number of dimensions. For large datasets, this is prohibitively expensive. In contrast, RACHET runs with at most O(n) time, space, and communication costs to build a global hierarchy of comparable clustering quality by merging locally generated clustering hierarchies. RACHET employs the encircling tactic in which the merges at each stage are chosen so as to minimize the volume of a covering hypersphere. For each cluster centroid, RACHET maintains descriptive statistics of constant complexity to enable these choices. RACHET's framework is applicable to a wide class of centroid-based hierarchical clustering algorithms, such as centroid, medoid, and Ward.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Rahul Mondal, Evelina Ignatova, … Robert Heyer

References

M.R. Anderberg, Cluster Analysis and Applications, Academic Press: New York, 1973.
Google Scholar
R. Brachman, T. Khabaza, W. Kloesgen, G. Piatetsky-Shapiro, and E. Simoudis, “Mining business databases,” Communications of ACM, vol. 39, no.11, pp. 42–48, 1996.
Google Scholar
W.H.E. Day and H. Edelsbrunner, “Efficient algorithms for agglomerative hierarchical clustering methods,” Journal of Classification, vol. 1, pp. 7–24, 1984.
Google Scholar
I. Dhillon and D. Modha, “A data clustering algorithm on distributed memory multiprocessors,” in Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, Mohammed Javeed Zaki and Ching-Tien Ho (Eds.), SIGKDD, Aug. 15, 1999, San Diego, CA, USA, pp. 245–260.
R. Dubes and A. Jain, “Clustering methodologies in exploratory data analysis,” Advances in Computers, vol. 19, pp. 113–228, 1980.
Google Scholar
U. Fayyad, D. Haussler, P. Stolorz, and R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, MIT Press: Cambridge, MA, 1996.
Google Scholar
K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press: New York, 1990.
Google Scholar
J.E. Jackson, A User's Guide to Principal Components, John Wiley & Sons: New York, 1991.
Google Scholar
A.K. Jain, M.N. Murty, and P.J. Flynn, “Data clustering: A review,” ACM Computing Surveys, vol. 31, pp. 264–323, 1999.
Google Scholar
E. Johnson and H. Kargupta, “Collective, hierarchical clustering from distributed, heterogeneous data,” Lecture Notes in Computer Science, vol. 1759, Springer-Verlag: Berlin, 1999.
Google Scholar
H. Kargupta, W. Huang, K. Sivakumar, and E. Johnson, “Distributed clustering using collective principal component analysis,” Knowledge and Information Systems, vol. 3, no.4, pp. 422–448, 2001.
Google Scholar
L. Kaufman and P. Rousseeuw, Finding Groups in Data, John Wiley and Sons: New York, 1989.
Google Scholar
G.N. Lance and W.T. Williams, “A general theory of classificatory sorting strategies. 1: Hierarchical systems,” Computer Journal, vol. 9, pp. 373–380, 1967.
Google Scholar
F. Murtagh, “A survey of recent advances in hierarchical clustering algorithms,” Computer Journal, vol. 26, p. 354–359, 1983.
Google Scholar
C. Olson, “Parallel algorithms for hierarchical clustering,” Parallel Computing, vol. 8, pp. 1313–1325, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Mathematics Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN, 37831, USA
Nagiza F. Samatova, George Ostrouchov & Al Geist
Oak Ridge National Laboratory, Molecular-Scale Engineering and Nanoscale Technologies Group, P.O. Box 2008, Oak Ridge, TN, 37831, USA
Anatoli V. Melechko

Authors

Nagiza F. Samatova
View author publications
You can also search for this author in PubMed Google Scholar
George Ostrouchov
View author publications
You can also search for this author in PubMed Google Scholar
Al Geist
View author publications
You can also search for this author in PubMed Google Scholar
Anatoli V. Melechko
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Samatova, N.F., Ostrouchov, G., Geist, A. et al. RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets. Distributed and Parallel Databases 11, 157–180 (2002). https://doi.org/10.1023/A:1013988102576

Download citation

Issue Date: March 2002
DOI: https://doi.org/10.1023/A:1013988102576

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Clustering graph data: the roadmap to spectral techniques

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Clustering graph data: the roadmap to spectral techniques

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation