Skip to main content
Log in

A framework for characterizing overlap of communication and computation in parallel applications

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end computers. In this paper, we propose an instrumentation framework for message-passing systems to characterize the degree of overlap of communication with computation in the execution of parallel applications. The inability to obtain precise time-stamps for pertinent communication events is a significant problem, and is addressed by generation of minimum and maximum bounds on achieved overlap. The overlap measures can aid application developers and system designers in investigating scalability issues. The approach has been used to instrument two MPI implementations as well as the ARMCI system. The implementation resides entirely within the communication library and thus integrates well with existing approaches that operate outside the library. The utility of the framework is demonstrated by analyzing communication-computation overlap for micro-benchmarks and the NAS benchmarks, and the insights obtained are used to modify the NAS SP benchmark, resulting in improved overlap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bhoedjang, R.A.F., Ruhl, T., Bal, H.E.: User-level network interface protocols. In: IEEE Computer, pp. 53–60, November 1998

  2. Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.: Myrinet: a gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995)

    Article  Google Scholar 

  3. Brightwell, R., Underwood, K.D.: An analysis of the impact of mpi overlap and independent progress. In: International Conference on Supercomputing (ICS) (2004)

  4. Brightwell, R., Underwood, K.D., Riesen, R.: An initial analysis of the impact of overlap and independent progress for MPI. In: EuroPVM/MPI (2004)

  5. DEEP/MPI, http://www.crescentbaysoftware.com/deep_mpi_top.html

  6. Dimemas, http://www.cepba.upc.es/dimemas

  7. Dimitrov, R.: Overlapping of communication and computation and early binding: Fundamental mechanisms for improving parallel performance on clusters of workstations. PhD thesis, Mississippi State University (2001)

  8. Dimitrov, R.: ChaMPIon/Pro—The complete MPI-2 for massively parallel Linux, Linux clusters: the HPC revolution (2004)

  9. Dimitrov, R., Skjellum, A.: Impact of latency on applications’ performance. In: Fourth MPI Developer’s and User’s Conference, March 2000

  10. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004, pp. 97–104

  11. InfiniBand Trade Association: InfiniBand, http://www.infinibandta.org

  12. Intel Trace Analyzer and Collector, http://www.intel.com/cd/software/products/asmo-na/eng/cluster/tanalyzer/index.htm

  13. KOJAK, http://www.fz-juelich.de/zam/kojak

  14. Lawry, B., Wilson, R., Maccabe, A.B., Brightwell, R.: COMB: a portable benchmark suite for assessing MPI overlap. In: IEEE Cluster (2002)

  15. Liu, J., Jiang, W., Wyckoff, P., Panda, D.K., Ashton, D., Buntinas, D., Gropp, W., Toonen, B.: Design and implementation of MPICH2 over InfiniBand with RDMA support. In: International Parallel and Distributed Processing Symposium (IPDPS), April 2004

  16. Mellanox Technologies: Mellanox VAPI interface, July 2002

  17. Message Passing Interface Forum: MPI: a message-passing interface standard, March 1994

  18. Moore, S., Cronk, D., London, K., Dongarra, J.: Review of performance analysis tools for MPI parallel programs. In: 8th European PVM/MPI Users’ Group Meeting. Lecture Notes in Computer Science, vol. 2131, pp. 241–248. Springer, Berlin (2001)

    Google Scholar 

  19. mpiP, http://www.llnl.gov/CASC/mpip

  20. Nieplocha, J., Tipparaju, V., Krishnan, M., Panda, D.K.: High performance remote memory access communications: the ARMCI approach. Int. J. High Perform. Comput. Appl. 20(2), 233–253 (2006)

    Article  Google Scholar 

  21. Nieplocha, J., Tipparaju, V., Krishnan, M., Santhanaraman, G., Panda, D.K.: Optimisation and performance evaluation of mechanisms for latency tolerance in remote memory access communication on clusters. Int. J. High Perform. Comput. Netw. (IJHPCN) 2(2-4) (2004)

  22. SvPablo, http://www.renci.org/software/pablo/svpablo

  23. Paradyn, http://www.paradyn.org

  24. Paraver, http://www.cepba.upc.es/paraver

  25. PERUSE, http://www.mpi-peruse.org

  26. Petrini, F., Feng, W., Hoisie, A., Coll, S., Frachtenberg, E.: The quadrics network (QsNet): high-performance clustering technology. In: Hot Interconnects 9, August 2001

  27. Sur, S., Jin, H.-W., Chai, L., Panda, D.K.: RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits. In: Symposium on Principles and Practice of Parallel Programming (PPOPP), March 2006

  28. TAU, http://www.cs.uoregon.edu/research/tau

  29. Tipparaju, V., Krishnan, M., Nieplocha, J., Santhanaraman, G., Panda, D.K.: Exploiting Nonblocking Remote Memory Access Communication in Scientific Benchmarks on Clusters. In: HiPC (2003)

  30. Tipparaju, V., Santhanaraman, G., Nieplocha, J., Panda, D.K.: Host assisted zero-copy remote memory access communication on InfiniBand. In: International Parallel and Distributed Processing Symposium (IPDPS), April 2004

  31. Vetter, J.S.: Performance analysis of distributed applications using automatic classification of communication inefficiencies. In: International Conference on Supercomputing (ICS) (2000)

  32. Vetter, J.S.: Dynamic statistical profiling of communication activity in distributed applications. In: International Conference on Measurement and Modeling of Computer Systems (2002)

  33. Vetter, J.S., McCracken, M.O.: Statistical scalability analysis of communication operations in distributed applications. In: Symposium on Principles and Practice of Parallel Programming (PPOPP) (2001)

  34. White, J.B., Bova, S.W.: Where’s the overlap? An analysis of popular MPI implementations. In: Third MPI Developers’ and Users’ Conference, March 1999

  35. Woodall, T., Graham, R., Castain, R., Daniel, D., Sukalski, M., Fagg, G., Gabriel, E., Bosilca, G., Angskun, T., Dongarra, J., Squyres, J., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A.: TEG: A high-performance, scalable, multi-network point-to-point communications methodology. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004, p. 303–310

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aniruddha G. Shet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shet, A.G., Sadayappan, P., Bernholdt, D.E. et al. A framework for characterizing overlap of communication and computation in parallel applications. Cluster Comput 11, 75–90 (2008). https://doi.org/10.1007/s10586-007-0046-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-007-0046-3

Keywords

Navigation