skip to main content
10.1145/1542275.1542321acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

MPI-aware compiler optimizations for improving communication-computation overlap

Published:08 June 2009Publication History

ABSTRACT

Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as a black box with unknown side effects and thus miss potential optimizations. This paper's contributions enable the development of an MPI-aware optimizing compiler that can perform transformations exploiting knowledge of MPI call effects to increase communication-computa-tion overlap. We formulate a set of data flow equations and rules to describe the side effects of key MPI functions so an MPI-aware compiler can automatically assess the safety of transformations. After categorizing existing compiler transformations based on their effect on the application code, we present an optimization algorithm that specifies when and how to apply these optimizing transformations to achieve improved communication-computation overlap. By manually applying the optimization algorithm to kernels extracted from HYCOM and the NAS benchmarks, we show that even when transforming these highly optimized codes, execution time can be decreased by an average of over 30%.

References

  1. Open64. http://open64.sourceforge.net.Google ScholarGoogle Scholar
  2. D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, December 1995.Google ScholarGoogle Scholar
  3. C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing Bandwidth Limited Problems Using One-Sided Communication and Overlap. In 20th International Parallel & Distributed Processing Symposium (IPDPS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bonachea. GASNet specification. Technical Report CSD-02-1207, University of California, Berkeley, October 2002. Google ScholarGoogle Scholar
  5. Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka. A compilation approach for Fortran 90D/HPF compilers on distributed memory MIMD computers. In Sixth Annual Workshop on Languages and Compilers for Parallel Computing, pages 200--215, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. P. Chassignet, L. T. Smith, G. R. Halliwell, and R. Bleck. North Atlantic simulation with the HYbrid Coordinate Ocean Model (HYCOM): Impact of the vertical coordinate choice, reference density, and thermobaricity. Journal of Physical Oceanography, 32:2504--2526, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  7. W.-Y. Chen, D. Bonachea, C. Iancu, and K. Yelick. Automatic Nonblocking Communication for Partitioned Global Address Space Programs. In ICS'07: Proceedings of the 21st annual International Conference on Supercomputing, pages 158--167, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W.-Y. Chen, C. Iancu, and K. Yelick. Communication optimizations for fine-grained upc applications. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 267--278, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dale Shires and Lori Pollock and Sara Sprenkle. Program Flow Graph Construction for Static Analysis of MPI Programs. In Parallel and Distributed Processing Techniques and Applications (PDPTA'99), pages 1847--1853, June 1999.Google ScholarGoogle Scholar
  10. A. Danalis, A. Brown, L. Pollock, M. Swany, and J. Cavazos. Gravel: a communication library to fast path MPI. In EuroPVM/MPI, Sep 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Danalis, K. Kim, L. Pollock, and M. Swany. Transformations to Parallel Codes for Communication-Computation Overlap. In SC'05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Danalis, L. Pollock, and M. Swany. Automatic MPI application transformation with ASPhALT. In Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL 2007), in conjunction with IPDPS 2007, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Danalis, L. Pollock, M. Swany, and J. Cavazos. Implementing an Open64-based Tool for Improving the Performance of MPI Programs. In The Open64 Workshop, in conjunction with IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2008, Apr 2008.Google ScholarGoogle Scholar
  14. D. Das, M. Gupta, R. Ravindran, W. Shivani, P. Sivakeshava, and R. Uppal. Compiler-Controlled Extraction of Computation-Communication Overlap in MPI Applications. In HIPS-POHLL joint Workshop on High-Level Parallel Programming Models and Supportive Environments and Performance Optimization for High-Level Languages and Libraries held in conjunction with the 22nd IEEE International Parallel & Distributed Processing Symposium(IPDPS 2008), April 2008.Google ScholarGoogle Scholar
  15. T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC specification v. 1.1. http://upc.gwu.edu/documentation, 2003.Google ScholarGoogle Scholar
  16. P. Feautrier. Array expansion. In ICS'88: Proceedings of the 2nd International Conference on Supercomputing, pages 429--441, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Supercomputing'95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing, page 71, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. CRPC-TR92225, Rice University, Houston, TX, 1993.Google ScholarGoogle Scholar
  19. P. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium language reference manual. Tech Report UCB/CSD-01-1163, U.C. Berkeley, November 2001. Google ScholarGoogle Scholar
  20. S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Multiprocessor performance measurement and evaluation, pages 57--71, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Hoefler, P. Gottschling, and A. Lumsdaine. Leveraging non-blocking Collective Communication in high-performance Applications. In SPAA'08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, pages 113--115, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In SC'07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--10, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Iancu, P. Husbands, and W. Chen. Message Strip Mining Heuristics for High Speed Networks. In VECPAR, 2004.Google ScholarGoogle Scholar
  24. A. Karwande, X. Yuan, and D. K. Lowenthal. CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Kennedy, B. Broom, K. Cooper, J. Dongarra, R. Fowler, D. Gannon, L. Johnsson, J. Mellor-Crummey, and L. Torczon. Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries. Journal of Parallel and Distributed Computing, 61(12):1803--1826, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michelle Mills Strout and Barbara Kreaseck and Paul D. Hovland. Data-Flow Analysis for MPI Programs. In International Conference on Parallel Processing (ICPP 2006), pages 175--184, Aug 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Nieplocha and B. Carpenter. ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems. In RTSPP IPPS/SDP'99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. W. Numrich and J. K. Reid. Co-Array Fortran for parallel programming. ACM Fortran Forum 17, 2, 1--31, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. C. Sancho, K. J. Barker, D. J. Kerbyson, and K. Davis. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In SC'06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 125, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. C. Sancho and D. J. Kerbyson. Improving the Performance of Multiple Conjugate Gradient Solvers by Exploiting Overlap. In Euro-Par'08: Proceedings of the 14th international Euro-Par Conference on Parallel Processing, pages 688--697, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. M. Strout, J. Mellor-Crummey, and P. D. Hovland. Representation-Independent Program Analysis. In the Sixth ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Wakatani and M. Wolfe. A New Approach to Array Redistribution: Strip Mining Redistribution. In PARLE'94, Athens, Greece, Jul 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. MPI-aware compiler optimizations for improving communication-computation overlap

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '09: Proceedings of the 23rd international conference on Supercomputing
          June 2009
          544 pages
          ISBN:9781605584980
          DOI:10.1145/1542275

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 June 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate584of2,055submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader