Skip to main content

A Coarse Grained Parallel Algorithm for Closest Larger Ancestors in Trees with Applications to Single Link Clustering

  • Conference paper
High Performance Computing and Communications (HPCC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 3726))

  • 624 Accesses

Abstract

Hierarchical clustering methods are important in many data mining and pattern recognition tasks. In this paper we present an efficient coarse grained parallel algorithm for Single Link Clustering; a standard inter-cluster linkage metric. Our approach is to first describe algorithms for the Prefix Larger Integer Set and the Closest Larger Ancestor problems and then to show how these can be applied to solve the Single Link Clustering problem. In an extensive performance analysis an implementation of these algorithms on a Linux-based cluster has shown to scale well, exhibiting near linear relative speedup.

Research partially supported by the Natural Sciences and Engineering Research Council of Canada

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arumugavelu, S., Ranganathan, N.: SIMD Algorithms for Single Link and Complete Link pattern clustering. In: Proc. of Intl. Conf. on Pattern Recognition (1996)

    Google Scholar 

  2. Chan, A., Dehne, F.: A coarse grained parallel algorithm for maximum weight matching in trees. In: Proceedings of 12th IASTED International Conference Parallel and Distributed Computing and Systems (PCDS 2000), pp. 134–138 (2000)

    Google Scholar 

  3. Chan, A., Dehne, F.: CGMlib/CGMgraph: Implementing and testing CGM graph algorithms on PC clusters. In: Proceedings of 10th European PVM/MPI User’s Group Meeting (Euro PVM/MPI 2003), pp. 117–125 (2003)

    Google Scholar 

  4. Chan, A., Dehne, F., Taylor, R.: Cgmgraph/cgmlib: Implementing and testing cgm graph algorithms on pc clusters and shared memory machines. The international Journal of High Performance Computing Applications 19(1), 81–97 (2005)

    Article  Google Scholar 

  5. Dahlhaus, E.: Fast parallel algorithm for the single link heuristics of hierarchical clustering. In: Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pp. 184–187 (1992)

    Google Scholar 

  6. Dehne, F., Fabri, A., Rau-Chaplin, A.: Scalable parallel geometric algorithms for coarse grained multicomputers. In: Proc. ACM Symposium on Computational Geometry, pp. 298–307 (1993)

    Google Scholar 

  7. DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Communication of the ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  8. Ferreira, A., Flocchini, P., Rieping, I., Roncato, A., Santoro, N., Cáceres, E., Dehne, F., Song, S.W.: Efficient parallel graph algorithms for coarse grained multicomputers and bsp

    Google Scholar 

  9. Gao, C.: Parallel single link clustering on coarse-grained multicomputers. Master’s thesis, Faculty of Computer Sceince, Dalhousie University (April 2004)

    Google Scholar 

  10. Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)

    Google Scholar 

  11. Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. In: International Conference on Data Engineering, vol. 25, pp. 345–366 (1999)

    Google Scholar 

  12. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  13. Li, X.: Parallel algorithms for hierarchical clustering and cluster validity. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(11), 1088–1092 (1990)

    Article  Google Scholar 

  14. Li, X., Fang, Z.: Parallel algorithms for clustering on Hypercube SIMD computers. In: Proceedings of 1986 Conference on Computer Vission and Pattern Recognition, pp. 130–133 (1986)

    Google Scholar 

  15. Li, X., Fang, Z.: Parallel clustering algorithms. Parallel Computing 11(3), 275–290 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  16. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: ACMSIGMOD Int. Conf. on Management of Data (1999)

    Google Scholar 

  17. Murtagh, F.: Multidimensional clustering algorithms. Physica-Verlag, Vienna (1985)

    MATH  Google Scholar 

  18. Olson, C.: Parallel algorithms for hierarchical clustering. Parallel Computing 21, 1313–1325 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  19. Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  20. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)

    Google Scholar 

  21. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chan, A., Gao, C., Rau-Chaplin, A. (2005). A Coarse Grained Parallel Algorithm for Closest Larger Ancestors in Trees with Applications to Single Link Clustering. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_96

Download citation

  • DOI: https://doi.org/10.1007/11557654_96

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29031-5

  • Online ISBN: 978-3-540-32079-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics