Skip to main content

Using profile information to assist advanced compiler optimization and scheduling

  • Conference paper
  • First Online:
Book cover Languages and Compilers for Parallel Computing (LCPC 1992)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 757))

Abstract

Compilers for superscalar and VLIW processors must expose sufficient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instructionlevel parallelism along the frequent execution scenario at the expense of the less frequent execution sequences. Profile information identifies these important execution sequences in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-based transformations have been incorporated into the IMPACT compiler. These transformations include global optimization, acyclic global scheduling, and software pipelining. The effectiveness of these profile-based techniques is evaluated for a range of superscalar and VLIW processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.

    Google Scholar 

  2. S. McFarling and J. Hennessy, “Reducing the cost of branches,” in Proceedings of the 13th International Symposium on Computer Architecture, pp. 396–403, June 1986.

    Google Scholar 

  3. W. W. Hwu, T. M. Conte, and P. P. Chang, “Comparing software and hardware schemes for reducing the cost of branches,” in Proceedings of the 16th International Symposium on Computer Architecture, pp. 224–233, May 1989.

    Google Scholar 

  4. J. A. Fisher, “Trace scheduling: A technique for global microcode compaction,” IEEE Transactions on Computers, vol. c-30, pp. 478–490, July 1981.

    Google Scholar 

  5. J. Ellis, Bulldog: A Compiler for VLIW Architectures. Cambridge, MA: The MIT Press, 1985.

    Google Scholar 

  6. P. P. Chang and W. W. Hwu, “Trace selection for compiling large C application programs to microcode,” in Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pp. 188–198, November 1988.

    Google Scholar 

  7. W. W. Hwu and P. P. Chang, “Achieving high instruction cache performance with an optimizing compiler,” in Proceedings of the 16th International Symposium on Computer Architecture, pp. 242–251, May 1989.

    Google Scholar 

  8. K. Pettis and R. C. Hansen, “Profile guided code positioning,” in Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pp. 16–27, June 1990.

    Google Scholar 

  9. D. W. Wall, “Global register allocation at link time,” in Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, pp. 264–275, June 1986.

    Google Scholar 

  10. W. W. Hwu and P. P. Chang, “Inline function expansion for compiling realistic C programs,” in Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation, pp. 246–257, June 1989.

    Google Scholar 

  11. P. P. Chang, S. A. Mahlke, and W. W. Hwu, “Using profile information to assist classic code optimizations,” Software Practice and Experience, vol. 21, pp. 1301–1321, December 1991.

    Google Scholar 

  12. P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, “IMPACT: An architectural framework for multiple-instruction-issue processors,” in Proceedings of the 18th International Symposium on Computer Architecture, pp. 266–275, May 1991.

    Google Scholar 

  13. D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe, “Dependence graphs and compiler optimizations,” in Proceedings of the 8th ACM Symposium on Principles of Programming Languages, pp. 207–218, January 1981.

    Google Scholar 

  14. S. A. Mahlke, W. Y. Chen, J. C. Gyllenhaal, W. W. Hwu, P. P. Chang, and T. Kiyohara, “Compiler code transformations for superscalar-based high-performance systems,” Proceeding of Supercomputing '92, Nov, 1992.

    Google Scholar 

  15. T. Nakatani and K. Ebcioglu, “Combining as a compilation technique for VLIW architectures,” in Proceedings of the 22nd International Workshop on Microprogramming and Microarchitecture, pp. 43–55, September 1989.

    Google Scholar 

  16. D. J. Kuck, The Structure of Computers and Computations. New York, NY: John Wiley and Sons, 1978.

    Google Scholar 

  17. R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman, “A VLIW architecture for a trace scheduling compiler,” in Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 180–192, April 1987.

    Google Scholar 

  18. G. Kane, MIPS R2000 RISC Architecture. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1987.

    Google Scholar 

  19. A. Aiken and A. Nicolau, “Optimal loop parallelization,” in Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, pp. 308–317, June 1988.

    Google Scholar 

  20. K. Ebcioglu, “A compilation technique for software pipelining of loops with conditional jumps,” in Micro 20, pp. 69–79, December 1987.

    Google Scholar 

  21. M. S. Lam, “Software pipelining: An effective scheduling technique for VLIW machines,” in Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, pp. 318–328, June 1988.

    Google Scholar 

  22. B. Su and J. Wang, “Gurpr*: A new global software pipelining algorithm,” in Micro 24, pp. 212–216, November 1991.

    Google Scholar 

  23. B. R. Rau and C. D. Glaeser, “Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing,” in Proceedings of the 20th Annual Workshop on Microprogramming and Microarchitecture, pp. 183–198, October 1981.

    Google Scholar 

  24. J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, “Conversion of control dependence to data dependence,” in Proceedings of the 10th ACM Symposium on Principles of Programming Languages, pp. 177–189, January 1983.

    Google Scholar 

  25. B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, “The Cydra 5 departmental supercomputer,” IEEE Computer, pp. 12–35, January 1989.

    Google Scholar 

  26. R. Towle, Control and Data Dependence for Program Transformations. PhD thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1976.

    Google Scholar 

  27. N. J. Warter, D. M. Lavery, and W. W. Hwu, “Using profile information to assist modulo scheduling,” tech. rep., Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1992.

    Google Scholar 

  28. M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orzag, F. Seidl, O. Johnson, G. Swanson, R. Goodrum, and J. Martin, “The PERFECT club benchmarks: Effective performance evaluation of supercomputers,” Tech. Rep. CSRD-827, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL, May 1989.

    Google Scholar 

  29. Intel, i860 64-Bit Microprocessor. Santa Clara, CA, 1989.

    Google Scholar 

  30. N. J. Warter, D. M. Lavery, and W. W. Hwu, “The benefit of Predicated Execution for software pipelining,” in Proceedings of the 23rd Hawaii International Conference on System Sci ences, to appear January 1993.

    Google Scholar 

  31. A. Nicolau, “Run-time disambiguation: coping with statically unpredictable dependencies,” IEEE Transactions on Computers, vol. 38, pp. 663–678, May 1989.

    Google Scholar 

  32. W. Y. Chen, S. A. Mahlke, W. W. Hwu, and T. Kiyohara, “Assisting compile-time code reordering with the memory conflict buffer,” tech. rep., Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, W. et al. (1993). Using profile information to assist advanced compiler optimization and scheduling. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1992. Lecture Notes in Computer Science, vol 757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57502-2_38

Download citation

  • DOI: https://doi.org/10.1007/3-540-57502-2_38

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57502-3

  • Online ISBN: 978-3-540-48201-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics