Skip to main content

Advanced Code Generation for High Performance Fortran

  • Chapter
  • First Online:
Compiler Optimizations for Scalable Parallel Systems

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1808))

Summary

For data-parallel languages such as High Performance Fortran to achieve wide acceptance, parallelizing compilers must be able to provide consistently high performance for a broad spectrum of scientific applications. Although compilation of regular data-parallel applications for message-passing systems have been widely studied, current state-of-the-art compilers implement only a small number of key optimizations, and the implementations generally focus on optimizing programs using a “case-based” approach. For these reasons, current compilers are unable to provide consistently high levels of performance. In this paper, we describe techniques developed in the Rice dHPF compiler to address key code generation challenges that arise in achieving high performance for regular applications on message-passing systems. We focus on techniques required to implement advanced optimizations and to achieve consistently high performance with existing optimizations. Many of the core communication analysis and code generation algorithms in dHPF are expressed in terms of abstract equations manipulating integer sets. This approach enables general and yet simple implementations of sophisticated optimizations, making it more practical to include a comprehensive set of optimizations in data-parallel compilers. It also enables the compiler to support much more aggressive computation partitioning algorithms than in previous compilers. We therefore believe this approach can provide higher and more consistent levels of performance than are available today.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. Adve, J. Mellor-Crummey, and A. Sethi. HPF analysis and code generation using integer sets. Technical Report CS-TR97-275, Dept. of Computer Science, Rice University, April 1997.

    Google Scholar 

  2. S. Amarasinghe and M. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, Albuquerque, NM, June 1993.

    Google Scholar 

  3. C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993.

    Google Scholar 

  4. C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Williamsburg, VA, April 1991.

    Google Scholar 

  5. P. Banerjee, J. Chandy, M. Gupta, E. Hodges, J. Holm, A. Lain, D. Palermo, S. Ramaswamy, and E. Su. The Paradigm compiler for distributed-memory multicomputers. IEEE Computer, 28(10):37–47, October 1995.

    Google Scholar 

  6. R. Barua, D. Kranz, and A. Agarwal. Communication-minimal partitioning of parallel loops and data arrays for cache-coherent distributed-memory multiprocessors. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1996.

    Google Scholar 

  7. S. Benkner, B. Chapman, and H. Zima. Vienna Fortran 90. In Proceedings of the 1992 Scalable High Performance Computing Conference, Williamsburg, VA, April 1992.

    Google Scholar 

  8. W. Blume and R. Eigenmann. Demand-driven symbolic range propagation. In Proceedings of the Eighth Workshop on Languages and Compilers for Parallel Computing, pages 141–160, Columbus, OH, August 1995.

    Google Scholar 

  9. François Bourdoncle. Abstract debugging of higher-order imperative languages. In Proceedings of the SIGPLAN’ 93 Conference on Programming Language Design and Implementation, pages 46–55, June 1993.

    Google Scholar 

  10. Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Compiling High Performance Fortran. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, pages 704–709, San Francisco, CA, February 1995.

    Google Scholar 

  11. S. Chakrabarti, M. Gupta, and J-D. Choi. Global communication analysis and optimization. In Proceedings of the SIGPLAN’ 96 Conference on Programming Language Design and Implementation, Philadelphia, PA, May 1996.

    Google Scholar 

  12. S. Chatterjee, J. Gilbert, R. Schreiber, and S. Teng. Optimal evaluation of array expressions on massively parallel machines. Technical Report CSL-92-11, Xerox Corporation, December 1992.

    Google Scholar 

  13. R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451–490, October 1991.

    Article  Google Scholar 

  14. M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice and Experience, 2(3):171–193, September 1990.

    Article  Google Scholar 

  15. M. Gupta and P. Banerjee. A methodology for high-level synthesis of communication for multicomputers. In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, DC, July 1992.

    Google Scholar 

  16. M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K. Wang, W. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of Supercomputing’ 95, San Diego, CA, December 1995.

    Google Scholar 

  17. S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. Compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing, 32(2):155–172, February 1996.

    Article  Google Scholar 

  18. R. v. Hanxleden. Compiler Support for Machine-Independent Parallelization of Irregular Problems. PhD thesis, Dept. of Computer Science, Rice University, December 1994.

    Google Scholar 

  19. J. Harris, J. Bircsak, M. R. Bolduc, J. A. Diewald, I. Gale, N. Johnson, S. Lee, C. A. Nelson, and C. Offner. Compiling High Performance Fortran for distributed-memory systems. Digital Technical Journal of Digital Equipment Corp., 7(3):5–23, Fall 1995.

    Google Scholar 

  20. W. H. Harrison. Compiler analysis of the value ranges for variables. IEEE Transactions on Software Engineering, SE-3(3):243–250, May 1977.

    Article  Google Scholar 

  21. Paul Havlak. Interprocedural Symbolic Analysis. PhD thesis, Dept. of Computer Science, Rice University, May 1994. Also available as CRPC-TR94451 from the Center for Research on Parallel Computation and CS-TR94-228 from the Rice Department of Computer Science.

    Google Scholar 

  22. High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1–2):1–170, 1993.

    Google Scholar 

  23. S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiler support for machine-independent parallel programming in Fortran D. In J. Saltz and P. Mehrotra, editors, Languages, Compilers, and Run-Time Environments for Distributed Memory Machines. North-Holland, Amsterdam, The Netherlands, 1992.

    Google Scholar 

  24. S. Hiranandani, K. Kennedy, and C.-W. Tseng. Preliminary experiences with the Fortran D compiler. In Proceedings of Supercomputing’ 93, Portland, OR, November 1993.

    Google Scholar 

  25. Harold Johnson. Data flow analysis of ‘intractable’ imbedded system software. In Proceedings of the SIGPLAN’ 86 Symposium on Compiler Construction, pages 109–117, 1986.

    Google Scholar 

  26. W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers’ 95: The 5th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, February 1995.

    Google Scholar 

  27. Wayne Kelly, Vadim Maslov, William Pugh, Evan Rosser, Tatiana Shpeisman, and David Wonnacott. The Omega Library Interface Guide. Technical report, Dept. of Computer Science, Univ. of Maryland, College Park, April 1996.

    Google Scholar 

  28. K. Kennedy, N. Nedeljković, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.

    Google Scholar 

  29. Ken Kennedy and Charles Koelbel. The High Performance Fortran 2.0 Language, chapter 1. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.

    Google Scholar 

  30. Ken Kennedy and Ajay Sethi. Resource-based communication placement analysis. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1996.

    Google Scholar 

  31. C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.

    Google Scholar 

  32. C. Koelbel and P. Mehrotra. Compiling global name-space parallel loops for distributed execution. IEEE Transactions on Parallel and Distributed Systems, 2(4):440–451, October 1991.

    Article  Google Scholar 

  33. J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361–376, July 1991.

    Article  Google Scholar 

  34. J. Mellor-Crummey and V. Adve. Simplifying control flow in compiler-generated parallel code. Technical Report CS-TR97-278, Dept. of Computer Science, Rice University, May 1997.

    Google Scholar 

  35. S. Midkiff. Local iteration set computation for block-cyclic distributions. In Proceedings of the 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995.

    Google Scholar 

  36. D. Oppen. A 22 s pn upper bound on the complexity of Presburger arithmetic. Journal of Computer and System Sciences, 16(3):323–332, July 1978.

    Article  MATH  MathSciNet  Google Scholar 

  37. W. Pugh. A practical algorithm for exact array dependence analysis. Communications of the ACM, 35(8):102–114, August 1992.

    Article  Google Scholar 

  38. J. Ramanujam. Integer Lattice Based Method for Local Address Generation for Block-Cyclic Distributions, chapter 17. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.

    Google Scholar 

  39. A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN’ 89 Conference on Programming Language. Design and Implementation, Portland, OR, June 1989.

    Google Scholar 

  40. J. Saltz. Runtime Support for Irregular Problems, chapter 17. Lecture Notes in Computer Science Series. Springer-Verlag, 1997.

    Google Scholar 

  41. A. Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, Chichester, Great Britain, 1986.

    MATH  Google Scholar 

  42. J. Stichnoth, D. O’Hallaron, and T. Gross. Generating communication for array statements: Design, implementation, and evaluation. Journal of Parallel and Distributed Computing, 21(1):150–159, April 1994.

    Article  Google Scholar 

  43. C.-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, January 1993.

    Google Scholar 

  44. Peng Tu and David Padua. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 1995 ACM International Conference on Supercomputing, Barcelona, Spain, July 1995.

    Google Scholar 

  45. Kees van Reeuwijk, Will Denissen, Henk Sips, and Edwin Paalvast. An implementation framework for hpf distributed arrays on message-passing parallel computer systems. IEEE Transactions on Parallel and Distributed Systems, 7(8):897–914, September 1996.

    Article  Google Scholar 

  46. H. Zima, H.-J. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6:1–18, 1988.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Adve, V., Mellor-Crummey, J. (2001). Advanced Code Generation for High Performance Fortran. In: Pande, S., Agrawal, D.P. (eds) Compiler Optimizations for Scalable Parallel Systems. Lecture Notes in Computer Science, vol 1808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45403-9_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45403-9_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41945-7

  • Online ISBN: 978-3-540-45403-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics