Abstract
This paper describes a combination of methods which make interprocedural data placement optimisation available to parallel libraries. We propose a delayed-evaluation, self-optimising (DESO) numerical library for a distributed-memory multicomputer. Delayed evaluation allows us to capture the control-flow of a user program from within the library at runtime, and to construct an optimised execution plan by propagating data placement constraints backwards through the DAG representing the computation to be performed.
Our strategy for optimising data placements at runtime consists of an efficient representation for data distributions, a greedy optimisation algorithm, which because of delayed evaluation can take account of the full context of operations, and of re-using the results of previous runtime optimisations on contexts we have encountered before. We show performance figures for our library on a cluster of Pentium II Linux workstations, which demonstrate that the overhead of our delayed evaluation method is very small, and which show both the parallel speedup we obtain and the benefit of the optimisations we describe.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers. Addison-Wesley, Reading, Massachusetts, 1986.
ASCI Red Pentium Pro BLAS 1.1e. See http://www.cs.utk.edu/~ghenry/distrib/ and http://developer.intel.com/design/perftool/perflibst/.
Uptal Banerjee. Unimodular transformations of double loops. Technical Report TR-1036, Center for Supercomputing Research and Development (CSRD), University of Illinois at Urbana-Champaign, 1990.
Richard Barrett, Mike Berry, Tony Chan, Jim Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Chuck Romine, and Henk van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, USA, 1994.
Olav Beckmann and Paul H J Kelly. Data distribution at run-time; re-using execution plans. To appear in Euro-Par’ 98, Southampton, U.K., September 1st–4th, 1998. Proceedings will be published by Springer Verlag in the LNCS Series.
Siegfried Benkner, Piyush Mehrotra, John Van Rosendale, and Hans Zima. High-level management of communication schedules in HPF-like languages. Technical Report TR-97-46, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, VA 23681, USA, September 1997.
Michele Dion, Cyril Randriamaro, and Yves Robert. Compiling affine nested loops: How to optimize the residual communications after the alignment phase. Journal of Parallel and Distributed Computing, 38(2):176–187, November 1996.
Andrei P. Ershov. On programming of arithmetic operations. Communications of the ACM, 1(8):3–6, 1958. Three figures from this article are in CACM 1(9):16.
Paul Feautrier. Toward automatic distribution. Parallel Processing Letters, 4(3):233–244, 1994.
William D. Gropp. Performance driven programming models. In MPPM’97, Proceedings of the 3 rd International Working Conference on Massively Parallel Programming Models, London, U.K., November 1997. To appear.
William D Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, USA, 1994.
Mary W. Hall, Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Interprocedural compilation of Fortran D. Journal of Parallel and Distributed Computing, 38:114–129, 1996.
John L. Hennessy and David A. Patterson. Computer Architecture A Quantative Approach. Morgan Kaufman, San Mateo, California, 1st edition, 1990.
High Performance Fortran Forum. High Performance Fortran language specification, version 1.1. TR CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, TX, November 1994.
Mary E. Mace. Storage Patterns in Parallel Processing. Kluwer Academic Press, 1987.
Ravi Ponnusamy, Joel Saltz, and Alok Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of Supercomputing’ 93: Portland, Oregon, November 15–19, 1993, pages 361–370, New York, NY 10036, USA, November 1993. ACM Press.
Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers, 40(5):603–612, May 1991.
R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11(1):25–33, January 1967.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Beckmann, O., Kelly, P.H.J. (1998). Efficient Interprocedural Data Placement Optimisation in a Parallel Library. In: O’Hallaron, D.R. (eds) Languages, Compilers, and Run-Time Systems for Scalable Computers. LCR 1998. Lecture Notes in Computer Science, vol 1511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49530-4_9
Download citation
DOI: https://doi.org/10.1007/3-540-49530-4_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65172-7
Online ISBN: 978-3-540-49530-7
eBook Packages: Springer Book Archive