research-article

MPI-aware compiler optimizations for improving communication-computation overlap

Authors:
Anthony Danalis

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Lori Pollock

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Martin Swany

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
John Cavazos

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

ICS '09: Proceedings of the 23rd international conference on SupercomputingJune 2009Pages 316–325https://doi.org/10.1145/1542275.1542321

Published:08 June 2009Publication History

ICS '09: Proceedings of the 23rd international conference on Supercomputing

Pages 316–325

ABSTRACT

Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as a black box with unknown side effects and thus miss potential optimizations. This paper's contributions enable the development of an MPI-aware optimizing compiler that can perform transformations exploiting knowledge of MPI call effects to increase communication-computa-tion overlap. We formulate a set of data flow equations and rules to describe the side effects of key MPI functions so an MPI-aware compiler can automatically assess the safety of transformations. After categorizing existing compiler transformations based on their effect on the application code, we present an optimization algorithm that specifies when and how to apply these optimizing transformations to achieve improved communication-computation overlap. By manually applying the optimization algorithm to kernels extracted from HYCOM and the NAS benchmarks, we show that even when transforming these highly optimized codes, execution time can be decreased by an average of over 30%.

References

Open64. http://open64.sourceforge.net.Google Scholar
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, December 1995.Google Scholar
C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing Bandwidth Limited Problems Using One-Sided Communication and Overlap. In 20th International Parallel & Distributed Processing Symposium (IPDPS), 2006. Google ScholarDigital Library
D. Bonachea. GASNet specification. Technical Report CSD-02-1207, University of California, Berkeley, October 2002. Google Scholar
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka. A compilation approach for Fortran 90D/HPF compilers on distributed memory MIMD computers. In Sixth Annual Workshop on Languages and Compilers for Parallel Computing, pages 200--215, 1993. Google ScholarDigital Library
E. P. Chassignet, L. T. Smith, G. R. Halliwell, and R. Bleck. North Atlantic simulation with the HYbrid Coordinate Ocean Model (HYCOM): Impact of the vertical coordinate choice, reference density, and thermobaricity. Journal of Physical Oceanography, 32:2504--2526, 2003.Google ScholarCross Ref
W.-Y. Chen, D. Bonachea, C. Iancu, and K. Yelick. Automatic Nonblocking Communication for Partitioned Global Address Space Programs. In ICS'07: Proceedings of the 21st annual International Conference on Supercomputing, pages 158--167, 2007. Google ScholarDigital Library
W.-Y. Chen, C. Iancu, and K. Yelick. Communication optimizations for fine-grained upc applications. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 267--278, 2005. Google ScholarDigital Library
Dale Shires and Lori Pollock and Sara Sprenkle. Program Flow Graph Construction for Static Analysis of MPI Programs. In Parallel and Distributed Processing Techniques and Applications (PDPTA'99), pages 1847--1853, June 1999.Google Scholar
A. Danalis, A. Brown, L. Pollock, M. Swany, and J. Cavazos. Gravel: a communication library to fast path MPI. In EuroPVM/MPI, Sep 2008. Google ScholarDigital Library
A. Danalis, K. Kim, L. Pollock, and M. Swany. Transformations to Parallel Codes for Communication-Computation Overlap. In SC'05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Google ScholarDigital Library
A. Danalis, L. Pollock, and M. Swany. Automatic MPI application transformation with ASPhALT. In Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL 2007), in conjunction with IPDPS 2007, 2007.Google ScholarCross Ref
A. Danalis, L. Pollock, M. Swany, and J. Cavazos. Implementing an Open64-based Tool for Improving the Performance of MPI Programs. In The Open64 Workshop, in conjunction with IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2008, Apr 2008.Google Scholar
D. Das, M. Gupta, R. Ravindran, W. Shivani, P. Sivakeshava, and R. Uppal. Compiler-Controlled Extraction of Computation-Communication Overlap in MPI Applications. In HIPS-POHLL joint Workshop on High-Level Parallel Programming Models and Supportive Environments and Performance Optimization for High-Level Languages and Libraries held in conjunction with the 22nd IEEE International Parallel & Distributed Processing Symposium(IPDPS 2008), April 2008.Google Scholar
T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC specification v. 1.1. http://upc.gwu.edu/documentation, 2003.Google Scholar
P. Feautrier. Array expansion. In ICS'88: Proceedings of the 2nd International Conference on Supercomputing, pages 429--441, 1988. Google ScholarDigital Library
M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Supercomputing'95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing, page 71, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. CRPC-TR92225, Rice University, Houston, TX, 1993.Google Scholar
P. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium language reference manual. Tech Report UCB/CSD-01-1163, U.C. Berkeley, November 2001. Google Scholar
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Multiprocessor performance measurement and evaluation, pages 57--71, 1995. Google ScholarDigital Library
T. Hoefler, P. Gottschling, and A. Lumsdaine. Leveraging non-blocking Collective Communication in high-performance Applications. In SPAA'08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, pages 113--115, 2008. Google ScholarDigital Library
T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In SC'07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--10, 2007. Google ScholarDigital Library
C. Iancu, P. Husbands, and W. Chen. Message Strip Mining Heuristics for High Speed Networks. In VECPAR, 2004.Google Scholar
A. Karwande, X. Yuan, and D. K. Lowenthal. CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2003. Google ScholarDigital Library
K. Kennedy, B. Broom, K. Cooper, J. Dongarra, R. Fowler, D. Gannon, L. Johnsson, J. Mellor-Crummey, and L. Torczon. Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries. Journal of Parallel and Distributed Computing, 61(12):1803--1826, 2001.Google ScholarDigital Library
Michelle Mills Strout and Barbara Kreaseck and Paul D. Hovland. Data-Flow Analysis for MPI Programs. In International Conference on Parallel Processing (ICPP 2006), pages 175--184, Aug 2006. Google ScholarDigital Library
S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997. Google ScholarDigital Library
J. Nieplocha and B. Carpenter. ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems. In RTSPP IPPS/SDP'99, 1999. Google ScholarDigital Library
R. W. Numrich and J. K. Reid. Co-Array Fortran for parallel programming. ACM Fortran Forum 17, 2, 1--31, 1998. Google ScholarDigital Library
J. C. Sancho, K. J. Barker, D. J. Kerbyson, and K. Davis. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In SC'06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 125, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
J. C. Sancho and D. J. Kerbyson. Improving the Performance of Multiple Conjugate Gradient Solvers by Exploiting Overlap. In Euro-Par'08: Proceedings of the 14th international Euro-Par Conference on Parallel Processing, pages 688--697, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
M. M. Strout, J. Mellor-Crummey, and P. D. Hovland. Representation-Independent Program Analysis. In the Sixth ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, September 2005. Google ScholarDigital Library
A. Wakatani and M. Wolfe. A New Approach to Array Redistribution: Strip Mining Redistribution. In PARLE'94, Athens, Greece, Jul 1994. Google ScholarDigital Library

Index Terms

MPI-aware compiler optimizations for improving communication-computation overlap

Recommendations

Effective communication and computation overlap with hybrid MPI/SMPSs
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication ...
Read More
A framework for characterizing overlap of communication and computation in parallel applications

Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end computers. In this paper, we propose an instrumentation framework for message-...
Read More
Overlapping communication and computation with OpenMP and MPI

Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines. Motivations for combing OpenMP and MPI are discussed. While OpenMP is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '09: Proceedings of the 23rd international conference on Supercomputing
June 2009
544 pages
ISBN:9781605584980
DOI:10.1145/1542275
General Chairs:
Michael Gschwind
IBM TJ Watson Research Center, USA
,
Alex Nicolau
UC Irvine, USA
,
Program Chairs:
Valentina Salapura
IBM TJ Watson Research Center, USA
,
José Moreira
IBM TJ Watson Research Center, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
communication-computation overlap
data flow analysis
mpi-aware compiler optimizations
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 522
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MPI-aware compiler optimizations for improving communication-computation overlap

ICS '09: Proceedings of the 23rd international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Effective communication and computation overlap with hybrid MPI/SMPSs

A framework for characterizing overlap of communication and computation in parallel applications

Overlapping communication and computation with OpenMP and MPI