Skip to main content

Interprocedural communication optimizations for distributed memory compilation

  • How to Communicate Better
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 892))

Abstract

Managing communication is a difficult problem in distributed memory compilation. When the exact data to be communicated cannot be determined at compile time, communication optimizations can be performed by runtime routines which generate schedule for communication. This leads to two optimization problems: placing communication so that data once communicated can be reused if possible and placing schedule calls so that the result of runtime preprocessing can be reused for communicating as many times as possible. In large application codes, computation and communication is spread across multiple subroutines, so acceptable performance cannot be achieved without performing these optimizations across subroutine boundaries. In this paper, we present an Interprocedural Analysis Framework for these two optimization problems. Our optimizations are based on a program abstraction we call Control & Call Flow Graph. This extends the call graph abstraction by storing the control flow relations between various call sites within a subroutine. We show how communication placement and schedule call placement problems can be solved by data-flow analysis on Control & Call Flow Graph structure.

This work was supported by NSF under grant No. ASC 9213821 and by ONR under contract No. N00014-93-1-0158. The authors assume all responsibility for the contents of the paper.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gagan Agrawal, Alan Sussman, and Joel Saltz. Compiler and runtime support for structured and block structured applications. In Proceedings Supercomputing '93, pages 578–587. IEEE Computer Society Press, November 1993.

    Google Scholar 

  2. Gagan Agrawal, Alan Sussman, and Joel Saltz. Efficient runtime support for parallelizing block structured applications. In Proceedings of the Scalable High Performance Computing Conference (SHPCC-94), pages 158–167. IEEE Computer Society Press, May 1994.

    Google Scholar 

  3. Gagan Agrawal, Alan Sussman, and Joel Saltz. An integrated runtime and compile-time approach for parallelizing structured and block structured applications. IEEE Transactions on Parallel and Distributed Systems, 1994. To appear. Also available as University of Maryland Technical Report CS-TR-3143 and UMIACS-TR-93-94.

    Google Scholar 

  4. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.

    Google Scholar 

  5. D. Callahan. The program summary graph and flow-sensitive interprocedural data flow analysis. In Proceedings of the SIGPLAN '88 Conference on Program Language Design and Implementation, Atlanta, GA, June 1988.

    Google Scholar 

  6. K. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software—Practice and Experience, 21(6):581–601, June 1991.

    Google Scholar 

  7. K. Cooper, K. Kennedy, and L. Torczon. The impact of interprocedural analysis and optimization in the rn programming environment. ACM Transactions on Programming Languages and Systems, 8(4):491–523, October 1986.

    Google Scholar 

  8. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451–490, October 1991.

    Google Scholar 

  9. R. Das, D. J. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy. The design and implementation of a parallel unstructured Euler solver using software primitives. AIAA Journal, 32(3):489–496, March 1994.

    Google Scholar 

  10. Raja Das, Joel Saltz, and Reinhard von Hanxleden. Slicing analysis and indirect access to distributed arrays. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, pages 152–168. Springer-Verlag, August 1993. Also available as University of Maryland Technical Report CS-TR-3076 and UMIACS-TR-93-42.

    Google Scholar 

  11. D.M. Dhamdhere. Practical adaptation of the global optimization algorithm of Morel and Renvoise. ACM Transactions on Programming Languages and Systems, 13(2):291–294, April 1991.

    Google Scholar 

  12. K. Drechsler and M. Stadel. A solution to a problem with Morel and Renvoise's “Global optimization by suppression of partial redundancies”. ACM Transactions on Programming Languages and Systems, 10(4):635–640, October 1988.

    Google Scholar 

  13. M. W. Hall, S. Hiranandani, K. Kennedy, and C. Tseng. Interprocedural compilation of Fortran D for MIMD distributed-memory machines. In Proceedings of Supercomputing '92, Minneapolis, MN, November 1992.

    Google Scholar 

  14. M. W. Hall, K. Kennedy, and K. S. McKinley. Interprocedural transformations for parallel code generation. In Proceedings of Supercomputing '91, Albuquerque, NM, November 1991.

    Google Scholar 

  15. Mary Hall. Managing Interprocedural Optimization. PhD thesis, Rice University, October 1990.

    Google Scholar 

  16. Mary Hall, John M Mellor Crummey, Alan Carle, and Rene G Rodriguez. FIAT: A framework for interprocedural analysis and transformations. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, pages 522–545. Springer-Verlag, August 1993.

    Google Scholar 

  17. Reinhard v. Hanxleden. Handling irregular problems with Fortran D — a preliminary report. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993. Also available as CRPC Technical Report CRPC-TR93339-S.

    Google Scholar 

  18. Reinhard v. Hanxleden. Give-n-take: A balanced code placement framework. Technical Report CRPC-TR94388-S, Center for Research on Parallel Computation, Rice University, March 1994.

    Google Scholar 

  19. P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3): 350–360, July 1991.

    Google Scholar 

  20. High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1–2):1–170, 1993.

    Google Scholar 

  21. S. Hiranandani, K. Kennedy, and C. Tseng. Evaluation of compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings of the Sixth International Conference on Supercomputing. ACM Press, July 1992.

    Google Scholar 

  22. Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8): 66–80, August 1992.

    Google Scholar 

  23. Susan Horwitz, Thomas Reps, and David Binkley. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, 12(1):26–60, January 1990.

    Google Scholar 

  24. J. Knoop, O. Rüthing, and B. Steffen. Lazy code motion. In Proceedings of the ACM SIGPLAN '92 Conference on Program Language Design and Implementation, San Francisco, CA, June 1992.

    Google Scholar 

  25. E. Morel and C. Renvoise. Global optimization by suppression of partial redundancies. Communications of the ACM, 22(2):96–103, February 1979.

    Google Scholar 

  26. E. Myers. A precise interprocedural data flow algorithm. In Conference Record of the Eighth ACM Symposium on the Principles of Programming Languages, pages 219–230, January 1981.

    Google Scholar 

  27. Ravi Ponnusamy, Joel Saltz, and Alok Choudhary. Runtime-compilation techniques for data partitioning and communication schedule reuse. In Proceedings Supercomputing '93, pages 361–370. IEEE Computer Society Press, November 1993. Also available as University of Maryland Technical Report CS-TR-3055 and UMIACS-TR-93-32.

    Google Scholar 

  28. B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redundant computations. In Conference Record of the Fifteenth ACM Symposium on the Principles of Programming Languages, pages 12–27, San Diego, CA, January 1988.

    Google Scholar 

  29. A. Sorkin. Some comments on “A solution to a problem with Morel and Renvoise's ‘Global optimization by suppression of partial redundancies' ”. ACM Transactions on Programming Languages and Systems, 11(4):666–668, October 1989.

    Google Scholar 

  30. Alan Sussman, Gagan Agrawal, and Joel Saltz. A manual for the multiblock PARTI runtime primitives, revision 4.1. Technical Report CS-TR-3070.1 and UMIACS-TR-93-36.1, University of Maryland, Department of Computer Science and UMIACS, December 1993.

    Google Scholar 

  31. Mark Weiser. Program slicing. IEEE Transactions on Software Engineering, 10:352–357, 1984.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keshav Pingali Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agrawal, G., Saltz, J. (1995). Interprocedural communication optimizations for distributed memory compilation. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025885

Download citation

  • DOI: https://doi.org/10.1007/BFb0025885

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58868-9

  • Online ISBN: 978-3-540-49134-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics