Abstract
In cluster computing, current communication functions under MPI library are not well optimized. Especially, the performance is worse if there are multiple sources and/or destinations involved, which are the cases of collective communication. Our algorithms uses multidimensional factorization and pairwise exchange communication/dissemination methods to improve the performance. They deliver better performance than previous algorithms such as ring, recursive doubling and dissemination algorithms. Experimental results show the improvement of 50% or so over MPICH version 1.2.6 on a Linux cluster.
This research was supported by Korea Science and Engineering Foundation(grant no.: R01-2001-0341-0). Preliminary results of the paper are to appear at Int. Conf. on Parallel and Distributed Systems, July 22, 2005.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benson, G., Chu, C., Huang, Q., Caglar, S.: A comparison of MPICH allgather algorithms on switched networks, Recent advances in Parallel Virtual Machine and Message Passing Interface, 10th European PVM/MPI Users’ Group Meeting. In: Dongarra, J.J., Laforenza, D., Orlando, S. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 2840, pp. 335–343. Springer, Heidelberg (2003)
Chan, E., Heimlich, M., Purkayastha, A., Geijn, R.: On Optimizing Collective Communication. In: Proceedings of 2004 IEEE International Conference on Cluster Computing, San Diego, USA, pp. 145–155 (September 2004)
Farrell, P., Ong, H.: Factors involved in the performance of computations on Beowulf clusters. Electronic Transactions on Numerical Analysis 15 (2003)
Geist, A., et al.: Parallel Virtual Machine, A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)
Gropp, W., Lusk, E., Dose, N., Skjellum, A.: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard
Hensgen, D., Finkel, R., Manber, U.: Two algorithms for barrier synchronization. International Journal of Parallel Programming 17(1), 1–17 (1988)
Hwang, K.: Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, New York (1993)
Kim, D., Kim, D.: Fast Broadcast by the Divide-and-Conquer Algorithm. In: Proceedings of 2004 IEEE International Conference on Cluster Computing, San Diego, USA, pp. 487–488 (September 2004)
Lee, K., Yoon, I., Kim, D.: Fast broadcast by message segmentation. In: Proceedings of 1999 Parallel and Distributed Processing Techniques and Applications, Monte Carlo Resort, Las Vegas, Nevada, USA, June 28 - July 1, 1999, pp. 2358–2364 (1999)
MPI and Embedded TCP/IP Gigabit Ethernet Cluster Computing, 27th Annual IEEE Conference on Local Computer Networks (LCN 2002), 6 - 8 November 2002, pp.733–734 (2002)
MPICH - A protable implementation of MPI. http://www.mcs.anl.gov/mpi/mpich
Pallas MPI Benchmarks - PMB, Part MPI-1. http://www.pallas.com
Sistare, S., Varrt, R., Loh, E.: Optimization of MPI collective on clusters of large-scale SMPs. In: Proceedings of SC99: High Performance Networking and Computing (November 1999)
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective Communication Operations in MPICH, Argonne National Laboratory
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, D., Kim, D. (2008). Design of Fast Collective Communication Functions on Clustered Workstations with Ethernet and Myrinet. In: Labarta, J., Joe, K., Sato, T. (eds) High-Performance Computing. ISHPC ALPS 2005 2006. Lecture Notes in Computer Science, vol 4759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77704-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-77704-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77703-8
Online ISBN: 978-3-540-77704-5
eBook Packages: Computer ScienceComputer Science (R0)