Skip to main content

Advanced loop optimizations for parallel computers

  • Session 4A: Compilers And Restructuring Techniques I
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 297))

Abstract

So far, most of the work on program dependence analysis has concentrated primarily on compile-time techniques, that are not always accurate and which are often conservative. By coupling the compiler's ability to perform elaborate program optimizations and the run-time system's more accurate knowledge of certain program characteristics, we can uncover and exploit even more parallelism in ordinary programs. By performing run-time dependence checking, certain types of loops that were previously treated as serial can be executed concurrently. This paper presents a run-time dependence checking scheme and a new compiler optimization aiming at parallelizing serial loops. In particular we present cycle shrinking, a compiler transformation that "shrinks" the dependence distances in serial loops, allowing parts of such loops to execute concurrently. Code reordering for minimizing communication and Subscript blocking, are also discussed briefly.

This work was supported in part by the National Science Foundation under Grant No. NSF DCR84-10110 and NSF DCR84-06916, the U. S. Department of Energy under Grant No. DOE DE-FG02-85ER25001, and the IBM Donation.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, Reading, Massachusetts, 1986.

    Google Scholar 

  2. F.E. Allen and J. Cocke, “A Catalogue of Optimizing Transformations,” Design and Optimization of Compilers, R. Rustin, Ed. Prentice-Hall, Englewood Cliffs, N.J., 1972, pp. 1–30.

    Google Scholar 

  3. J.R. Allen and K. Kennedy, “PFC: A Program to Convert Fortran to Parallel Form,” Techn. Rept. MASC-TR82-6, Rice University, Houston, Texas, March 1982.

    Google Scholar 

  4. Alliant Computer Systems Corp., "FX/Series Architecture Manual," Acton, Massachusetts, 1985

    Google Scholar 

  5. American National Standards Institute, American National Standard for Information Systems. Programming Language Fortran S8 (X3.9-198x). Revision of X3.9-1978, Draft S8, Version 99, ANSI, New York, April 1986.

    Google Scholar 

  6. U. Banerjee, "Speedup of Ordinary Programs," Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCDCS-R-79-989, October 1979.

    Google Scholar 

  7. B. Brode, “Precompilation of Fortran Programs to Facilitate Array Processing,” Computer 14, 9, September 1981, pp. 46–51.

    Google Scholar 

  8. S. Chen, "Large-scale and High-speed Multiprocessor System for Scientific Applications — Cray-X-MP-2 Series," Proc. of NATO Advanced Research Workshop on High Speed Computing, Kawalik(Editor), pp. 59–67, June 1983.

    Google Scholar 

  9. R.G. Cytron, “Doacross: Beyond Vectorization for Multiprocessors (Extended Abstract),” Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, pp. 836–844, August, 1986.

    Google Scholar 

  10. J. R. Beckman Davies, “Parallel Loop Constructs for Multiprocessors,” M.S. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCDCS-R-81-1070, May, 1981.

    Google Scholar 

  11. K. Kennedy, “Automatic Vectorization of Fortran Programs to Vector Form,” Technical Report, Rice University, Houston, TX, October, 1980.

    Google Scholar 

  12. D.J. Kuck, R.H. Kuhn, B. Leasure, and M. Wolfe, "The Structure of an Advanced Vectorizer for Pipelined Processors," Fourth International Computer Software and Applications Conference, October, 1980.

    Google Scholar 

  13. D.J. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. Wolfe, "Dependence Graphs and Compiler Optimizations," Proceedings of the 8-th ACM Symposium on Principles of Programming Languages, pp. 207–218, January 1981.

    Google Scholar 

  14. D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A.H. Sameh, “Parallel Supercomputing Today and the Cedar Approach,” Science 231, 4740 February 28, 1986, pp. 967–974.

    Google Scholar 

  15. D.J. Kuck, The Structure of Computers and Computations, Volume 1, John Wiley and Sons, New York, 1978.

    Google Scholar 

  16. P. Mehrotra and J. Van Rosendale, “The Blaze Language: A Parallel Language for Scientific Programming,” Rep. 85–29, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, Va., May 1985.

    Google Scholar 

  17. K. Miura and K. Uchida, “Facom Vector Processor VP-100/VP-200,” High Speed Computation, NATO ASI Series, Vol. F7, J.S. Kowalik Ed., Springer-Verlag, New York, 1984.

    Google Scholar 

  18. S. Nagashima, Y. Inagami, T. Odaka, and S. Kawabe, “Design Consideration for a High-Speed Vector Processor: The Hitachi S-810,” Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers, ICCD 84, IEEE Press, New York, 1984.

    Google Scholar 

  19. D.A. Padua Haiek, "Multiprocessors: Discussions of Some Theoretical and Practical Problems," Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCDCS-R-79-990, November 1979.

    Google Scholar 

  20. D.A. Padua, and M. Wolfe, “Advanced Compiler Optimizations for Supercomputers,” Communications of the ACM, Vol. 29, No. 12, pp. 1184–1201, December 1986.

    Google Scholar 

  21. C. D. Polychronopoulos and D. J. Kuck, “Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers,” to appear IEEE Transactions on Computers, Special Issue on Supercomputing, December, 1987.

    Google Scholar 

  22. C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, “Utilizing Multidimensional Loop Parallelism on Large-Scale Parallel Processor Systems,” accepted for publication, IEEE Transactions on Computers, September 1987.

    Google Scholar 

  23. C. D. Polychronopoulos, “On Program Restructuring, Scheduling, and Communication for Parallel Processor Systems,” Ph.D. Thesis, CSRD No. 595, Center for Supercomputing Research and Development, University of Illinois, August, 1986.

    Google Scholar 

  24. C. D. Polychronopoulos, “More on Loop Optimizations,” Technical Report, Center for Supercomputing Research and Development, University of Illinois, September, 1987.

    Google Scholar 

  25. M. J. Wolfe, “Optimizing Supercompilers for Supercomputers,” Ph.D. Thesis, University of Illinois at Urbana-Champaign, DCS Report No. UIUCCDCS-R-82-1105, 1982.

    Google Scholar 

  26. C. Q. Zhu and P. C. Yew, “A Synchronization Scheme and Its Applications for Large Multiprocessor Systems,” Proc. of the 1984 International Conference on Distributed Computing Systems, pp. 486–493, May 1984.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

E. N. Houstis T. S. Papatheodorou C. D. Polychronopoulos

Rights and permissions

Reprints and permissions

Copyright information

© 1988 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Polychronopoulos, C.D. (1988). Advanced loop optimizations for parallel computers. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds) Supercomputing. ICS 1987. Lecture Notes in Computer Science, vol 297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-18991-2_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-18991-2_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-18991-6

  • Online ISBN: 978-3-540-38888-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics