Abstract
The performance of the MPI’s collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the buffering scheme of the underlying MPI implementation. In this paper, we discuss an approach in which the collective communications are tuned for any given system by conducting a series of experiments on the system. We also discuss a dynamic topology method that uses the tuned static topology shape, but re-orders the logical addresses to compensate for changing run time variations. A series of ex-periments were conducted comparing our tuned MPI_Bcast to various native vendor MPI implementations. The results obtained were encouraging, and show that our implementations of collective algorithms can significantly improve the performance of current MPI implementations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Thilo Kielmann, Henri E. Bal and Segei Gorlatch. Bandwidth-efficient Collective Communication for Clustered Wide Area Systems. IPDPS 2000, Cancun, Mexico. (May 1–5, 2000)
Lars Paul Huse. Collective Communication on Dedicated Clusters of Workstations.
David Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonianand T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proc.Symposium on Principles and Practice of Parallel Programming (PpoPP), pages 1–12, SanDiego, CA (May 1993).
R. Rabenseifner. A new optimized MPI reduce algorithm. http://www.hlrs.de/structure/support/parallelcomputing/models/mpi/mvreduce.html (1997).
Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker and Jack Dongarra. MPI-The Complete Reference. Volume 1, The MPI Core, second edition (1998).
M. Frigo. FFTW: An Adaptive Software Architecture for the FFT. Proceedings of the ICASSP Conference, page 1381, Vol. 3. (1998).
R. Clint Whaley and Jack Dongarra. Automatically Tuned Linear Algebra Software. SC98: High Performance Networking and Computing. http://www.cs.utk.edu/~rwhalev/ATL/INDEX.HTM (1998)
L. Prylli and B. Tourancheau. “BIP: a new protocol designed for high performance networking on myrinet” In the PC-NOW workshop, IPPS/SPDP 1998, Orlando, USA, 1998.
Beck, Dongarra, Fagg, Geist, Gray, Kohl, Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. Scott, V. Sunderam, “HARNESS: a next generation distributed virtual machine”, Journal of Future Generation Computer Systems, (15), Elsevier Science B.V., 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fagg, G.E., Vadhiyar, S.S., Dongarra, J.J. (2000). ACCT: Automatic Collective Communications Tuning. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2000. Lecture Notes in Computer Science, vol 1908. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45255-9_48
Download citation
DOI: https://doi.org/10.1007/3-540-45255-9_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41010-2
Online ISBN: 978-3-540-45255-3
eBook Packages: Springer Book Archive