Article

An analysis of the impact of MPI overlap and independent progress

Authors:

Ron Brightwell,

Keith D. UnderwoodAuthors Info & Claims

ICS '04: Proceedings of the 18th annual international conference on Supercomputing

Pages 298 - 305

https://doi.org/10.1145/1006209.1006251

Published: 26 June 2004 Publication History

Abstract

The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of MPI to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have been poorly studied at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. This paper extends this previous work by further qualifying the source of the performance advantage (offload, overlap, or independent progress).

References

[1]

N. J. Boden, D. Cohen, R. E. F. A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W.-K. Su. Myrinet: A gigabit-per-second local area network. IEEE Micro, 15(1):29--36, Feb. 1995.

Digital Library

[2]

R. Brightwell. A new MPI implementation for Cray SHMEM. Technical report, Sandia National Laboratories.

[3]

R. Brightwell and K. Underwood. Evaluation of an eager protocol optimization for MPI. In Proceedings of EuroPVM/MPI, September 2003.

[4]

R. Brightwell and K. D. Underwood. An analysis of NIC resource usage for offloading MPI. In Proceedings of the 2002 Workshop on Communication Architecture for Clusters, Santa Fe, NM, April 2004.

[5]

R. Brightwell and K. D. Underwood. An initial analysis of the impact of overlap and independent progress for mpi. In submitted, 2004.

[6]

R. B. Brightwell and P. L. Shuler. Design and implementation of MPI on Puma portals. In Proceedings of the Second MPI Developer's Conference, pages 18--25, July 1996.

Digital Library

[7]

Cray Research, Inc. SHMEM Technical Note for C, SG-2516 2.3, October 1994.

[8]

Infiniband Trade Association. http://www.innibandta.org, 1999.

[9]

J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. K. Panda. Performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. In The International Conference for High Performance Computing and Communications (SC2003), November 2003.

Digital Library

[10]

A. B. Maccabe, R. Riesen, and D. W. van Dresser. Dynamic processor modes in Puma. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 8(2):4--12, 1996.

[11]

F. Petrini, W. chun Feng, A. Hoisie, S. Coll, and E. Frachtenberg. The Quadrics network: High-performance clustering technology. IEEE Micro, 22(1):46--57, January/February 2002.

Digital Library

[12]

F. Petrini, D. J. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Identifying and eliminating the performance variability on the ASCI Q machine. In Proceedings of the 2003 Conference on High Performance Networking and Computing, November 2003.

Digital Library

[13]

L. Shuler, C. Jong, R. Riesen, D. van Dresser, A. B. Maccabe, L. A. Fisk, and T. M. Stallcup. The Puma operating system for massively parallel computers. In Proceeding of the 1995 Intel Supercomputer User's Group Conference. Intel Supercomputer User's Group, 1995.

[14]

S. R. W. Timothy G. Mattson, David Scott. A TeraFLOPS Supercomputer in 1996: The ASCI TFLOP System. In Proceedings of the 1996 International Parallel Processing Symposium, 1996.

Digital Library

[15]

K. D. Underwood and R. Brightwell. The impact of mpi queue usage on mpi latency. In submitted, 2004.

Digital Library

[16]

J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In 16th International Parallel and Distributed Processing Symposium (IPDPS'02), pages 27--29, April 2002.

Digital Library

[17]

F. Wong, R. Martin, R. Arpaci-Dusseau, and D. E. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Proceedings of the SC99 Conference on High Performance Networking and Computing, November 1999.

Digital Library

Cited By

AlOnazi ARogowski MAl-Zawawi AKeyes D(2018)Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547565(1-7)Online publication date: Sep-2018
https://doi.org/10.1109/HPEC.2018.8547565
Fanfarillo AHammond JDongarra JHolmes DCollis ALarsson Träff JSmith L(2016)CAF Events Implementation Using MPI-3 CapabilitiesProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966916(198-207)Online publication date: 25-Sep-2016
https://dl.acm.org/doi/10.1145/2966884.2966916
Cardellini VFanfarillo AFilippone S(2016)Heterogeneous CAF-Based Load Balancing on Intel Xeon Phi2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.51(702-711)Online publication date: May-2016
https://doi.org/10.1109/IPDPSW.2016.51
Show More Cited By

Index Terms

An analysis of the impact of MPI overlap and independent progress
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
        Message passing

Recommendations

Implementation and performance analysis of non-blocking collective operations for MPI
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Collective operations and non-blocking point-to-point operations have always been part of MPI. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. In this paper we ...
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of the Message Passing Interface (MPI) to make independent progress (that is, to make progress on ...
Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers
MPIDC '96: Proceedings of the Second MPI Developers Conference

Abstract: We are concerned with the parallelization of finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '04: Proceedings of the 18th annual international conference on Supercomputing

June 2004

360 pages

ISBN:1581138393

DOI:10.1145/1006209

General Chair:
Paul Feautrier
LIP, ENS Lyon
,
Program Chairs:
James Goodman
University of Auckland
,
André Seznec
IRISA, INRIA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICS04

Sponsor:

ICS04: International Conference on Supercomputing 2004

June 26 - July 1, 2004

Malo, France

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
569
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

AlOnazi ARogowski MAl-Zawawi AKeyes D(2018)Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547565(1-7)Online publication date: Sep-2018
https://doi.org/10.1109/HPEC.2018.8547565
Fanfarillo AHammond JDongarra JHolmes DCollis ALarsson Träff JSmith L(2016)CAF Events Implementation Using MPI-3 CapabilitiesProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966916(198-207)Online publication date: 25-Sep-2016
https://dl.acm.org/doi/10.1145/2966884.2966916
Cardellini VFanfarillo AFilippone S(2016)Heterogeneous CAF-Based Load Balancing on Intel Xeon Phi2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.51(702-711)Online publication date: May-2016
https://doi.org/10.1109/IPDPSW.2016.51
Guo JYi QMeng JZhang JBalaji P(2016)Compiler-Assisted Overlapping of Communication and Computation in MPI Applications2016 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2016.62(60-69)Online publication date: Sep-2016
https://doi.org/10.1109/CLUSTER.2016.62
Seo SLatham RZhang JBalaji PBalaji PXu C(2015)Implementation and evaluation of MPI nonblocking collective I/OProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.81(1084-1091)Online publication date: 4-May-2015
https://dl.acm.org/doi/10.1109/CCGrid.2015.81
Ramos STaboada GExpósito RTouriño J(2015)Nonblocking collectives for scalable Java communicationsConcurrency and Computation: Practice & Experience10.1002/cpe.327927:5(1169-1187)Online publication date: 10-Apr-2015
https://dl.acm.org/doi/10.1002/cpe.3279
Vetter JLee SLi DMarin GMcCurdy CMeredith JRoth PSpafford K(2014)Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific ApplicationsHigh Performance Computing Systems. Performance Modeling, Benchmarking and Simulation10.1007/978-3-319-10214-6_1(3-24)Online publication date: 1-Oct-2014
https://doi.org/10.1007/978-3-319-10214-6_1
Xu CVenkata MGraham RWang YLiu ZYu WEpema D(2013)SLOAVxProceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2013.22(369-376)Online publication date: 13-May-2013
https://dl.acm.org/doi/10.1109/CCGrid.2013.22
van der Wijngaart RMattson THaas W(2011)Light-weight communications on Intel's single-chip cloud computer processorACM SIGOPS Operating Systems Review10.1145/1945023.194503345:1(73-83)Online publication date: 18-Feb-2011
https://dl.acm.org/doi/10.1145/1945023.1945033
Potluri SLai PTomko KSur SCui YTatineni MSchulz KBarth WMajumdar APanda DBoku TNakashima HMendelson A(2010)Quantifying performance benefits of overlap using MPI-2 in a seismic modeling applicationProceedings of the 24th ACM International Conference on Supercomputing10.1145/1810085.1810092(17-25)Online publication date: 2-Jun-2010
https://dl.acm.org/doi/10.1145/1810085.1810092
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten