skip to main content
10.1145/1006209.1006251acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

An analysis of the impact of MPI overlap and independent progress

Published:26 June 2004Publication History

ABSTRACT

The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of MPI to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have been poorly studied at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. This paper extends this previous work by further qualifying the source of the performance advantage (offload, overlap, or independent progress).

References

  1. N. J. Boden, D. Cohen, R. E. F. A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W.-K. Su. Myrinet: A gigabit-per-second local area network. IEEE Micro, 15(1):29--36, Feb. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Brightwell. A new MPI implementation for Cray SHMEM. Technical report, Sandia National Laboratories.Google ScholarGoogle Scholar
  3. R. Brightwell and K. Underwood. Evaluation of an eager protocol optimization for MPI. In Proceedings of EuroPVM/MPI, September 2003.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Brightwell and K. D. Underwood. An analysis of NIC resource usage for offloading MPI. In Proceedings of the 2002 Workshop on Communication Architecture for Clusters, Santa Fe, NM, April 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Brightwell and K. D. Underwood. An initial analysis of the impact of overlap and independent progress for mpi. In submitted, 2004.Google ScholarGoogle Scholar
  6. R. B. Brightwell and P. L. Shuler. Design and implementation of MPI on Puma portals. In Proceedings of the Second MPI Developer's Conference, pages 18--25, July 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cray Research, Inc. SHMEM Technical Note for C, SG-2516 2.3, October 1994.Google ScholarGoogle Scholar
  8. Infiniband Trade Association. http://www.innibandta.org, 1999.Google ScholarGoogle Scholar
  9. J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. K. Panda. Performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. In The International Conference for High Performance Computing and Communications (SC2003), November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. B. Maccabe, R. Riesen, and D. W. van Dresser. Dynamic processor modes in Puma. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 8(2):4--12, 1996.Google ScholarGoogle Scholar
  11. F. Petrini, W. chun Feng, A. Hoisie, S. Coll, and E. Frachtenberg. The Quadrics network: High-performance clustering technology. IEEE Micro, 22(1):46--57, January/February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Petrini, D. J. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Identifying and eliminating the performance variability on the ASCI Q machine. In Proceedings of the 2003 Conference on High Performance Networking and Computing, November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Shuler, C. Jong, R. Riesen, D. van Dresser, A. B. Maccabe, L. A. Fisk, and T. M. Stallcup. The Puma operating system for massively parallel computers. In Proceeding of the 1995 Intel Supercomputer User's Group Conference. Intel Supercomputer User's Group, 1995.Google ScholarGoogle Scholar
  14. S. R. W. Timothy G. Mattson, David Scott. A TeraFLOPS Supercomputer in 1996: The ASCI TFLOP System. In Proceedings of the 1996 International Parallel Processing Symposium, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. D. Underwood and R. Brightwell. The impact of mpi queue usage on mpi latency. In submitted, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In 16th International Parallel and Distributed Processing Symposium (IPDPS'02), pages 27--29, April 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Wong, R. Martin, R. Arpaci-Dusseau, and D. E. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Proceedings of the SC99 Conference on High Performance Networking and Computing, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An analysis of the impact of MPI overlap and independent progress

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICS '04: Proceedings of the 18th annual international conference on Supercomputing
    June 2004
    360 pages
    ISBN:1581138393
    DOI:10.1145/1006209

    Copyright © 2004 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 26 June 2004

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate584of2,055submissions,28%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader