skip to main content
10.1145/1006209.1006251acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

An analysis of the impact of MPI overlap and independent progress

Published: 26 June 2004 Publication History

Abstract

The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of MPI to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have been poorly studied at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. This paper extends this previous work by further qualifying the source of the performance advantage (offload, overlap, or independent progress).

References

[1]
N. J. Boden, D. Cohen, R. E. F. A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W.-K. Su. Myrinet: A gigabit-per-second local area network. IEEE Micro, 15(1):29--36, Feb. 1995.
[2]
R. Brightwell. A new MPI implementation for Cray SHMEM. Technical report, Sandia National Laboratories.
[3]
R. Brightwell and K. Underwood. Evaluation of an eager protocol optimization for MPI. In Proceedings of EuroPVM/MPI, September 2003.
[4]
R. Brightwell and K. D. Underwood. An analysis of NIC resource usage for offloading MPI. In Proceedings of the 2002 Workshop on Communication Architecture for Clusters, Santa Fe, NM, April 2004.
[5]
R. Brightwell and K. D. Underwood. An initial analysis of the impact of overlap and independent progress for mpi. In submitted, 2004.
[6]
R. B. Brightwell and P. L. Shuler. Design and implementation of MPI on Puma portals. In Proceedings of the Second MPI Developer's Conference, pages 18--25, July 1996.
[7]
Cray Research, Inc. SHMEM Technical Note for C, SG-2516 2.3, October 1994.
[8]
Infiniband Trade Association. http://www.innibandta.org, 1999.
[9]
J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. K. Panda. Performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. In The International Conference for High Performance Computing and Communications (SC2003), November 2003.
[10]
A. B. Maccabe, R. Riesen, and D. W. van Dresser. Dynamic processor modes in Puma. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 8(2):4--12, 1996.
[11]
F. Petrini, W. chun Feng, A. Hoisie, S. Coll, and E. Frachtenberg. The Quadrics network: High-performance clustering technology. IEEE Micro, 22(1):46--57, January/February 2002.
[12]
F. Petrini, D. J. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Identifying and eliminating the performance variability on the ASCI Q machine. In Proceedings of the 2003 Conference on High Performance Networking and Computing, November 2003.
[13]
L. Shuler, C. Jong, R. Riesen, D. van Dresser, A. B. Maccabe, L. A. Fisk, and T. M. Stallcup. The Puma operating system for massively parallel computers. In Proceeding of the 1995 Intel Supercomputer User's Group Conference. Intel Supercomputer User's Group, 1995.
[14]
S. R. W. Timothy G. Mattson, David Scott. A TeraFLOPS Supercomputer in 1996: The ASCI TFLOP System. In Proceedings of the 1996 International Parallel Processing Symposium, 1996.
[15]
K. D. Underwood and R. Brightwell. The impact of mpi queue usage on mpi latency. In submitted, 2004.
[16]
J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In 16th International Parallel and Distributed Processing Symposium (IPDPS'02), pages 27--29, April 2002.
[17]
F. Wong, R. Martin, R. Arpaci-Dusseau, and D. E. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Proceedings of the SC99 Conference on High Performance Networking and Computing, November 1999.

Cited By

View all
  • (2018)Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547565(1-7)Online publication date: Sep-2018
  • (2016)CAF Events Implementation Using MPI-3 CapabilitiesProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966916(198-207)Online publication date: 25-Sep-2016
  • (2016)Heterogeneous CAF-Based Load Balancing on Intel Xeon Phi2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.51(702-711)Online publication date: May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '04: Proceedings of the 18th annual international conference on Supercomputing
June 2004
360 pages
ISBN:1581138393
DOI:10.1145/1006209
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI
  2. message passing
  3. overlap

Qualifiers

  • Article

Conference

ICS04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547565(1-7)Online publication date: Sep-2018
  • (2016)CAF Events Implementation Using MPI-3 CapabilitiesProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966916(198-207)Online publication date: 25-Sep-2016
  • (2016)Heterogeneous CAF-Based Load Balancing on Intel Xeon Phi2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.51(702-711)Online publication date: May-2016
  • (2016)Compiler-Assisted Overlapping of Communication and Computation in MPI Applications2016 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2016.62(60-69)Online publication date: Sep-2016
  • (2015)Implementation and evaluation of MPI nonblocking collective I/OProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.81(1084-1091)Online publication date: 4-May-2015
  • (2015)Nonblocking collectives for scalable Java communicationsConcurrency and Computation: Practice & Experience10.1002/cpe.327927:5(1169-1187)Online publication date: 10-Apr-2015
  • (2014)Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific ApplicationsHigh Performance Computing Systems. Performance Modeling, Benchmarking and Simulation10.1007/978-3-319-10214-6_1(3-24)Online publication date: 1-Oct-2014
  • (2013)SLOAVxProceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2013.22(369-376)Online publication date: 13-May-2013
  • (2011)Light-weight communications on Intel's single-chip cloud computer processorACM SIGOPS Operating Systems Review10.1145/1945023.194503345:1(73-83)Online publication date: 18-Feb-2011
  • (2010)Quantifying performance benefits of overlap using MPI-2 in a seismic modeling applicationProceedings of the 24th ACM International Conference on Supercomputing10.1145/1810085.1810092(17-25)Online publication date: 2-Jun-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media