Efficient Interprocedural Data Placement Optimisation in a Parallel Library

Beckmann, Olav; Kelly, Paul H. J.

doi:10.1007/3-540-49530-4_9

Olav Beckmann⁵ &
Paul H. J. Kelly⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1511))

Included in the following conference series:

International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers

239 Accesses
9 Citations

Abstract

This paper describes a combination of methods which make interprocedural data placement optimisation available to parallel libraries. We propose a delayed-evaluation, self-optimising (DESO) numerical library for a distributed-memory multicomputer. Delayed evaluation allows us to capture the control-flow of a user program from within the library at runtime, and to construct an optimised execution plan by propagating data placement constraints backwards through the DAG representing the computation to be performed.

Our strategy for optimising data placements at runtime consists of an efficient representation for data distributions, a greedy optimisation algorithm, which because of delayed evaluation can take account of the full context of operations, and of re-using the results of previous runtime optimisations on contexts we have encountered before. We show performance figures for our library on a cluster of Pentium II Linux workstations, which demonstrate that the overhead of our delayed evaluation method is very small, and which show both the parallel speedup we obtain and the benefit of the optimisations we describe.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers. Addison-Wesley, Reading, Massachusetts, 1986.
Google Scholar
ASCI Red Pentium Pro BLAS 1.1e. See http://www.cs.utk.edu/~ghenry/distrib/ and http://developer.intel.com/design/perftool/perflibst/.
Uptal Banerjee. Unimodular transformations of double loops. Technical Report TR-1036, Center for Supercomputing Research and Development (CSRD), University of Illinois at Urbana-Champaign, 1990.
Google Scholar
Richard Barrett, Mike Berry, Tony Chan, Jim Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Chuck Romine, and Henk van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, USA, 1994.
Google Scholar
Olav Beckmann and Paul H J Kelly. Data distribution at run-time; re-using execution plans. To appear in Euro-Par’ 98, Southampton, U.K., September 1st–4th, 1998. Proceedings will be published by Springer Verlag in the LNCS Series.
Google Scholar
Siegfried Benkner, Piyush Mehrotra, John Van Rosendale, and Hans Zima. High-level management of communication schedules in HPF-like languages. Technical Report TR-97-46, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, VA 23681, USA, September 1997.
Google Scholar
Michele Dion, Cyril Randriamaro, and Yves Robert. Compiling affine nested loops: How to optimize the residual communications after the alignment phase. Journal of Parallel and Distributed Computing, 38(2):176–187, November 1996.
Google Scholar
Andrei P. Ershov. On programming of arithmetic operations. Communications of the ACM, 1(8):3–6, 1958. Three figures from this article are in CACM 1(9):16.
Article Google Scholar
Paul Feautrier. Toward automatic distribution. Parallel Processing Letters, 4(3):233–244, 1994.
Article Google Scholar
William D. Gropp. Performance driven programming models. In MPPM’97, Proceedings of the 3 ^rd International Working Conference on Massively Parallel Programming Models, London, U.K., November 1997. To appear.
Google Scholar
William D Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, USA, 1994.
Google Scholar
Mary W. Hall, Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Interprocedural compilation of Fortran D. Journal of Parallel and Distributed Computing, 38:114–129, 1996.
Article MATH Google Scholar
John L. Hennessy and David A. Patterson. Computer Architecture A Quantative Approach. Morgan Kaufman, San Mateo, California, 1^st edition, 1990.
Google Scholar
High Performance Fortran Forum. High Performance Fortran language specification, version 1.1. TR CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, TX, November 1994.
Google Scholar
Mary E. Mace. Storage Patterns in Parallel Processing. Kluwer Academic Press, 1987.
Google Scholar
Ravi Ponnusamy, Joel Saltz, and Alok Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of Supercomputing’ 93: Portland, Oregon, November 15–19, 1993, pages 361–370, New York, NY 10036, USA, November 1993. ACM Press.
Google Scholar
Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers, 40(5):603–612, May 1991.
Google Scholar
R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11(1):25–33, January 1967.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Imperial College, 180 Queen’s Gate, London, SW7 2BZ, UK
Olav Beckmann & Paul H. J. Kelly

Authors

Olav Beckmann
View author publications
You can also search for this author in PubMed Google Scholar
Paul H. J. Kelly
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Electrical and Computer Engineering School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3891, USA
David R. O’Hallaron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beckmann, O., Kelly, P.H.J. (1998). Efficient Interprocedural Data Placement Optimisation in a Parallel Library. In: O’Hallaron, D.R. (eds) Languages, Compilers, and Run-Time Systems for Scalable Computers. LCR 1998. Lecture Notes in Computer Science, vol 1511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49530-4_9

Download citation

DOI: https://doi.org/10.1007/3-540-49530-4_9
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65172-7
Online ISBN: 978-3-540-49530-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics