Skip to main content

Efficient Interprocedural Data Placement Optimisation in a Parallel Library

  • Conference paper
  • First Online:
Languages, Compilers, and Run-Time Systems for Scalable Computers (LCR 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1511))

Abstract

This paper describes a combination of methods which make interprocedural data placement optimisation available to parallel libraries. We propose a delayed-evaluation, self-optimising (DESO) numerical library for a distributed-memory multicomputer. Delayed evaluation allows us to capture the control-flow of a user program from within the library at runtime, and to construct an optimised execution plan by propagating data placement constraints backwards through the DAG representing the computation to be performed.

Our strategy for optimising data placements at runtime consists of an efficient representation for data distributions, a greedy optimisation algorithm, which because of delayed evaluation can take account of the full context of operations, and of re-using the results of previous runtime optimisations on contexts we have encountered before. We show performance figures for our library on a cluster of Pentium II Linux workstations, which demonstrate that the overhead of our delayed evaluation method is very small, and which show both the parallel speedup we obtain and the benefit of the optimisations we describe.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers. Addison-Wesley, Reading, Massachusetts, 1986.

    Google Scholar 

  2. ASCI Red Pentium Pro BLAS 1.1e. See http://www.cs.utk.edu/~ghenry/distrib/ and http://developer.intel.com/design/perftool/perflibst/.

  3. Uptal Banerjee. Unimodular transformations of double loops. Technical Report TR-1036, Center for Supercomputing Research and Development (CSRD), University of Illinois at Urbana-Champaign, 1990.

    Google Scholar 

  4. Richard Barrett, Mike Berry, Tony Chan, Jim Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Chuck Romine, and Henk van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, USA, 1994.

    Google Scholar 

  5. Olav Beckmann and Paul H J Kelly. Data distribution at run-time; re-using execution plans. To appear in Euro-Par’ 98, Southampton, U.K., September 1st–4th, 1998. Proceedings will be published by Springer Verlag in the LNCS Series.

    Google Scholar 

  6. Siegfried Benkner, Piyush Mehrotra, John Van Rosendale, and Hans Zima. High-level management of communication schedules in HPF-like languages. Technical Report TR-97-46, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, VA 23681, USA, September 1997.

    Google Scholar 

  7. Michele Dion, Cyril Randriamaro, and Yves Robert. Compiling affine nested loops: How to optimize the residual communications after the alignment phase. Journal of Parallel and Distributed Computing, 38(2):176–187, November 1996.

    Google Scholar 

  8. Andrei P. Ershov. On programming of arithmetic operations. Communications of the ACM, 1(8):3–6, 1958. Three figures from this article are in CACM 1(9):16.

    Article  Google Scholar 

  9. Paul Feautrier. Toward automatic distribution. Parallel Processing Letters, 4(3):233–244, 1994.

    Article  Google Scholar 

  10. William D. Gropp. Performance driven programming models. In MPPM’97, Proceedings of the 3 rd International Working Conference on Massively Parallel Programming Models, London, U.K., November 1997. To appear.

    Google Scholar 

  11. William D Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, USA, 1994.

    Google Scholar 

  12. Mary W. Hall, Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Interprocedural compilation of Fortran D. Journal of Parallel and Distributed Computing, 38:114–129, 1996.

    Article  MATH  Google Scholar 

  13. John L. Hennessy and David A. Patterson. Computer Architecture A Quantative Approach. Morgan Kaufman, San Mateo, California, 1st edition, 1990.

    Google Scholar 

  14. High Performance Fortran Forum. High Performance Fortran language specification, version 1.1. TR CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, TX, November 1994.

    Google Scholar 

  15. Mary E. Mace. Storage Patterns in Parallel Processing. Kluwer Academic Press, 1987.

    Google Scholar 

  16. Ravi Ponnusamy, Joel Saltz, and Alok Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of Supercomputing’ 93: Portland, Oregon, November 15–19, 1993, pages 361–370, New York, NY 10036, USA, November 1993. ACM Press.

    Google Scholar 

  17. Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers, 40(5):603–612, May 1991.

    Google Scholar 

  18. R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11(1):25–33, January 1967.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Beckmann, O., Kelly, P.H.J. (1998). Efficient Interprocedural Data Placement Optimisation in a Parallel Library. In: O’Hallaron, D.R. (eds) Languages, Compilers, and Run-Time Systems for Scalable Computers. LCR 1998. Lecture Notes in Computer Science, vol 1511. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49530-4_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-49530-4_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65172-7

  • Online ISBN: 978-3-540-49530-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics