Skip to main content

A new algorithm for global optimization for parallelism and locality

  • Postlinear Loop Transformations
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 892))

Abstract

Converting sequential programs to execute on parallel computers is difficult because of the need to globally optimize for both parallelism and data locality. The choice of which loop nests to parallelize, and how, drastically affects data locality. Similarly, data distribution directives, such as DISTRIBUTE in High Performance Fortran (HPF), affects available parallelism and locality. What is needed is a systematic approach to converting programs to parallel form, based upon analysis that identifies opportunities for both parallelism and locality in one representation.

This paper presents a global framework for optimizing parallelism and locality, based upon constraint solving for locality between potentially parallel loop nests. We outline the theory behind the framework, and provide a global algorithm for parallelizing programs while optimizing for locality. We also give results from applying the algorithm to parallelizing the Perfect benchmarks, targeted at the KSR-1, and analyze the results. Unlike other approaches, we do not assume an explicit distribution of data to processors. The distribution is inferred from locality constraints and available parallelism. This approach works well for machines such as the KSR-1, where there is no explicit distribution of data. However, our approach could be used to generate code for distributed memory processors (such as generating HPF) with explicit data distribution.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J., and Lam, M. S. Global optimizations for parallelism and locality on scalable parallel machines. In SIGPLAN Programming Language Design and Implementation (1993), pp. 112–125. 2. Appelbe, B., Doddapaneni, S., Hardnett, C., and Smith, K. Determining transformation sequences for loop parallelization. Tech. Rep. GIT-ICS-92/59, Georgia Institute of Technology, Nov. 1992.

    Google Scholar 

  2. Appelbe, B., Hardnett, C., and Doddapaneni, S. Aligning data structures for parallelism and locality. Tech. Rep. GIT-CC-94-20, Georgia Institute of Technology, Feb. 1994.

    Google Scholar 

  3. Appelbe, B., and Smith, K. Determining transformation sequences for loop parallelization. In Fifth Workshop on Languages and Compilers for Parallel Computing (July 1993).

    Google Scholar 

  4. Callahan, D. A Global Approach to Detection of Parallelism. PhD thesis, Rice University, 1987. Rice Tech Report, COMP TR87-50.

    Google Scholar 

  5. Chatterjee, S., Gilbert, J., and Schreiber, R. The alignment-distribution graph. In Sixth Workshop on Languages and Compilers for Parallel Computing (July 1993), pp. 234–252.

    Google Scholar 

  6. Gupta, M., and Banerjee, P. Paradigm: A compiler for automatic data distribution on multicomputers. In International Conference on Supercomputing (June 1993), pp. 87–96.

    Google Scholar 

  7. Haghighat, M., and Polychronopoulos, C. Symbolic analysis for parallelizing compilers. Tech. rep., University of Illinois, 1994.

    Google Scholar 

  8. Kennedy, K., and McKinley, K. Optimizing for parallelism and data locality. In International Conference on Supercomputing (July 1992), pp. 323–334.

    Google Scholar 

  9. Ramanujam, J., and Sadayappan, P. Tiling multidimensional iteration spaces for nonshared memory machines. In Supecomputing '91 (Nov. 1991), pp. 111–121.

    Google Scholar 

  10. Wolf, M. E., and Lam, M. S. A data locality optimizing algorithm. In SIG-PLAN Programming Language Design and Implementation (1991), pp. 30–44.

    Google Scholar 

  11. Wolf, M. E., and Lam, M. S. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2, 4 (October 1991), 452–482.

    Google Scholar 

  12. Zima, H., and Chapman, B.Supercompilers for Parallel and Vector Computers. ACM Press, New York, New York, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keshav Pingali Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Appelbe, B., Doddapaneni, S., Hardnett, C. (1995). A new algorithm for global optimization for parallelism and locality. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025875

Download citation

  • DOI: https://doi.org/10.1007/BFb0025875

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58868-9

  • Online ISBN: 978-3-540-49134-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics