A framework for characterizing overlap of communication and computation in parallel applications

Shet, Aniruddha G.; Sadayappan, P.; Bernholdt, David E.; Nieplocha, Jarek; Tipparaju, Vinod

doi:10.1007/s10586-007-0046-3

A framework for characterizing overlap of communication and computation in parallel applications

Published: 14 February 2008

Volume 11, pages 75–90, (2008)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Aniruddha G. Shet¹,
P. Sadayappan¹,
David E. Bernholdt²,
Jarek Nieplocha³ &
…
Vinod Tipparaju³

110 Accesses
9 Citations
Explore all metrics

Abstract

Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end computers. In this paper, we propose an instrumentation framework for message-passing systems to characterize the degree of overlap of communication with computation in the execution of parallel applications. The inability to obtain precise time-stamps for pertinent communication events is a significant problem, and is addressed by generation of minimum and maximum bounds on achieved overlap. The overlap measures can aid application developers and system designers in investigating scalability issues. The approach has been used to instrument two MPI implementations as well as the ARMCI system. The implementation resides entirely within the communication library and thus integrates well with existing approaches that operate outside the library. The utility of the framework is demonstrated by analyzing communication-computation overlap for micro-benchmarks and the NAS benchmarks, and the insights obtained are used to modify the NAS SP benchmark, resulting in improved overlap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Static Approximation of MPI Communication Graphs for Optimized Process Placement

Finepoints: Partitioned Multithreaded MPI Communication

Communication-Aware Hardware-Assisted MPI Overlap Engine

References

Bhoedjang, R.A.F., Ruhl, T., Bal, H.E.: User-level network interface protocols. In: IEEE Computer, pp. 53–60, November 1998
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.: Myrinet: a gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995)
Article Google Scholar
Brightwell, R., Underwood, K.D.: An analysis of the impact of mpi overlap and independent progress. In: International Conference on Supercomputing (ICS) (2004)
Brightwell, R., Underwood, K.D., Riesen, R.: An initial analysis of the impact of overlap and independent progress for MPI. In: EuroPVM/MPI (2004)
DEEP/MPI, http://www.crescentbaysoftware.com/deep_mpi_top.html
Dimemas, http://www.cepba.upc.es/dimemas
Dimitrov, R.: Overlapping of communication and computation and early binding: Fundamental mechanisms for improving parallel performance on clusters of workstations. PhD thesis, Mississippi State University (2001)
Dimitrov, R.: ChaMPIon/Pro—The complete MPI-2 for massively parallel Linux, Linux clusters: the HPC revolution (2004)
Dimitrov, R., Skjellum, A.: Impact of latency on applications’ performance. In: Fourth MPI Developer’s and User’s Conference, March 2000
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004, pp. 97–104
InfiniBand Trade Association: InfiniBand, http://www.infinibandta.org
Intel Trace Analyzer and Collector, http://www.intel.com/cd/software/products/asmo-na/eng/cluster/tanalyzer/index.htm
KOJAK, http://www.fz-juelich.de/zam/kojak
Lawry, B., Wilson, R., Maccabe, A.B., Brightwell, R.: COMB: a portable benchmark suite for assessing MPI overlap. In: IEEE Cluster (2002)
Liu, J., Jiang, W., Wyckoff, P., Panda, D.K., Ashton, D., Buntinas, D., Gropp, W., Toonen, B.: Design and implementation of MPICH2 over InfiniBand with RDMA support. In: International Parallel and Distributed Processing Symposium (IPDPS), April 2004
Mellanox Technologies: Mellanox VAPI interface, July 2002
Message Passing Interface Forum: MPI: a message-passing interface standard, March 1994
Moore, S., Cronk, D., London, K., Dongarra, J.: Review of performance analysis tools for MPI parallel programs. In: 8th European PVM/MPI Users’ Group Meeting. Lecture Notes in Computer Science, vol. 2131, pp. 241–248. Springer, Berlin (2001)
Google Scholar
mpiP, http://www.llnl.gov/CASC/mpip
Nieplocha, J., Tipparaju, V., Krishnan, M., Panda, D.K.: High performance remote memory access communications: the ARMCI approach. Int. J. High Perform. Comput. Appl. 20(2), 233–253 (2006)
Article Google Scholar
Nieplocha, J., Tipparaju, V., Krishnan, M., Santhanaraman, G., Panda, D.K.: Optimisation and performance evaluation of mechanisms for latency tolerance in remote memory access communication on clusters. Int. J. High Perform. Comput. Netw. (IJHPCN) 2(2-4) (2004)
SvPablo, http://www.renci.org/software/pablo/svpablo
Paradyn, http://www.paradyn.org
Paraver, http://www.cepba.upc.es/paraver
PERUSE, http://www.mpi-peruse.org
Petrini, F., Feng, W., Hoisie, A., Coll, S., Frachtenberg, E.: The quadrics network (QsNet): high-performance clustering technology. In: Hot Interconnects 9, August 2001
Sur, S., Jin, H.-W., Chai, L., Panda, D.K.: RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits. In: Symposium on Principles and Practice of Parallel Programming (PPOPP), March 2006
TAU, http://www.cs.uoregon.edu/research/tau
Tipparaju, V., Krishnan, M., Nieplocha, J., Santhanaraman, G., Panda, D.K.: Exploiting Nonblocking Remote Memory Access Communication in Scientific Benchmarks on Clusters. In: HiPC (2003)
Tipparaju, V., Santhanaraman, G., Nieplocha, J., Panda, D.K.: Host assisted zero-copy remote memory access communication on InfiniBand. In: International Parallel and Distributed Processing Symposium (IPDPS), April 2004
Vetter, J.S.: Performance analysis of distributed applications using automatic classification of communication inefficiencies. In: International Conference on Supercomputing (ICS) (2000)
Vetter, J.S.: Dynamic statistical profiling of communication activity in distributed applications. In: International Conference on Measurement and Modeling of Computer Systems (2002)
Vetter, J.S., McCracken, M.O.: Statistical scalability analysis of communication operations in distributed applications. In: Symposium on Principles and Practice of Parallel Programming (PPOPP) (2001)
White, J.B., Bova, S.W.: Where’s the overlap? An analysis of popular MPI implementations. In: Third MPI Developers’ and Users’ Conference, March 1999
Woodall, T., Graham, R., Castain, R., Daniel, D., Sukalski, M., Fagg, G., Gabriel, E., Bosilca, G., Angskun, T., Dongarra, J., Squyres, J., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A.: TEG: A high-performance, scalable, multi-network point-to-point communications methodology. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004, p. 303–310

Download references

Author information

Authors and Affiliations

The Ohio State University, Columbus, OH, 43210, USA
Aniruddha G. Shet & P. Sadayappan
Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
David E. Bernholdt
Pacific Northwest National Laboratory, Richland, WA, 99352, USA
Jarek Nieplocha & Vinod Tipparaju

Authors

Aniruddha G. Shet
View author publications
You can also search for this author in PubMed Google Scholar
P. Sadayappan
View author publications
You can also search for this author in PubMed Google Scholar
David E. Bernholdt
View author publications
You can also search for this author in PubMed Google Scholar
Jarek Nieplocha
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Tipparaju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aniruddha G. Shet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shet, A.G., Sadayappan, P., Bernholdt, D.E. et al. A framework for characterizing overlap of communication and computation in parallel applications. Cluster Comput 11, 75–90 (2008). https://doi.org/10.1007/s10586-007-0046-3

Download citation

Received: 16 March 2007
Accepted: 29 October 2007
Published: 14 February 2008
Issue Date: March 2008
DOI: https://doi.org/10.1007/s10586-007-0046-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for characterizing overlap of communication and computation in parallel applications

Abstract

Access this article

Similar content being viewed by others

Static Approximation of MPI Communication Graphs for Optimized Process Placement

Finepoints: Partitioned Multithreaded MPI Communication

Communication-Aware Hardware-Assisted MPI Overlap Engine

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for characterizing overlap of communication and computation in parallel applications

Abstract

Access this article

Similar content being viewed by others

Static Approximation of MPI Communication Graphs for Optimized Process Placement

Finepoints: Partitioned Multithreaded MPI Communication

Communication-Aware Hardware-Assisted MPI Overlap Engine

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation