Abstract
Compilers for superscalar and VLIW processors must expose sufficient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instructionlevel parallelism along the frequent execution scenario at the expense of the less frequent execution sequences. Profile information identifies these important execution sequences in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-based transformations have been incorporated into the IMPACT compiler. These transformations include global optimization, acyclic global scheduling, and software pipelining. The effectiveness of these profile-based techniques is evaluated for a range of superscalar and VLIW processors.
Preview
Unable to display preview. Download preview PDF.
References
A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
S. McFarling and J. Hennessy, “Reducing the cost of branches,” in Proceedings of the 13th International Symposium on Computer Architecture, pp. 396–403, June 1986.
W. W. Hwu, T. M. Conte, and P. P. Chang, “Comparing software and hardware schemes for reducing the cost of branches,” in Proceedings of the 16th International Symposium on Computer Architecture, pp. 224–233, May 1989.
J. A. Fisher, “Trace scheduling: A technique for global microcode compaction,” IEEE Transactions on Computers, vol. c-30, pp. 478–490, July 1981.
J. Ellis, Bulldog: A Compiler for VLIW Architectures. Cambridge, MA: The MIT Press, 1985.
P. P. Chang and W. W. Hwu, “Trace selection for compiling large C application programs to microcode,” in Proceedings of the 21st International Workshop on Microprogramming and Microarchitecture, pp. 188–198, November 1988.
W. W. Hwu and P. P. Chang, “Achieving high instruction cache performance with an optimizing compiler,” in Proceedings of the 16th International Symposium on Computer Architecture, pp. 242–251, May 1989.
K. Pettis and R. C. Hansen, “Profile guided code positioning,” in Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pp. 16–27, June 1990.
D. W. Wall, “Global register allocation at link time,” in Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, pp. 264–275, June 1986.
W. W. Hwu and P. P. Chang, “Inline function expansion for compiling realistic C programs,” in Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation, pp. 246–257, June 1989.
P. P. Chang, S. A. Mahlke, and W. W. Hwu, “Using profile information to assist classic code optimizations,” Software Practice and Experience, vol. 21, pp. 1301–1321, December 1991.
P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, “IMPACT: An architectural framework for multiple-instruction-issue processors,” in Proceedings of the 18th International Symposium on Computer Architecture, pp. 266–275, May 1991.
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe, “Dependence graphs and compiler optimizations,” in Proceedings of the 8th ACM Symposium on Principles of Programming Languages, pp. 207–218, January 1981.
S. A. Mahlke, W. Y. Chen, J. C. Gyllenhaal, W. W. Hwu, P. P. Chang, and T. Kiyohara, “Compiler code transformations for superscalar-based high-performance systems,” Proceeding of Supercomputing '92, Nov, 1992.
T. Nakatani and K. Ebcioglu, “Combining as a compilation technique for VLIW architectures,” in Proceedings of the 22nd International Workshop on Microprogramming and Microarchitecture, pp. 43–55, September 1989.
D. J. Kuck, The Structure of Computers and Computations. New York, NY: John Wiley and Sons, 1978.
R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman, “A VLIW architecture for a trace scheduling compiler,” in Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 180–192, April 1987.
G. Kane, MIPS R2000 RISC Architecture. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1987.
A. Aiken and A. Nicolau, “Optimal loop parallelization,” in Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, pp. 308–317, June 1988.
K. Ebcioglu, “A compilation technique for software pipelining of loops with conditional jumps,” in Micro 20, pp. 69–79, December 1987.
M. S. Lam, “Software pipelining: An effective scheduling technique for VLIW machines,” in Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, pp. 318–328, June 1988.
B. Su and J. Wang, “Gurpr*: A new global software pipelining algorithm,” in Micro 24, pp. 212–216, November 1991.
B. R. Rau and C. D. Glaeser, “Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing,” in Proceedings of the 20th Annual Workshop on Microprogramming and Microarchitecture, pp. 183–198, October 1981.
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, “Conversion of control dependence to data dependence,” in Proceedings of the 10th ACM Symposium on Principles of Programming Languages, pp. 177–189, January 1983.
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, “The Cydra 5 departmental supercomputer,” IEEE Computer, pp. 12–35, January 1989.
R. Towle, Control and Data Dependence for Program Transformations. PhD thesis, Department of Computer Science, University of Illinois, Urbana, IL, 1976.
N. J. Warter, D. M. Lavery, and W. W. Hwu, “Using profile information to assist modulo scheduling,” tech. rep., Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1992.
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orzag, F. Seidl, O. Johnson, G. Swanson, R. Goodrum, and J. Martin, “The PERFECT club benchmarks: Effective performance evaluation of supercomputers,” Tech. Rep. CSRD-827, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL, May 1989.
Intel, i860 64-Bit Microprocessor. Santa Clara, CA, 1989.
N. J. Warter, D. M. Lavery, and W. W. Hwu, “The benefit of Predicated Execution for software pipelining,” in Proceedings of the 23rd Hawaii International Conference on System Sci ences, to appear January 1993.
A. Nicolau, “Run-time disambiguation: coping with statically unpredictable dependencies,” IEEE Transactions on Computers, vol. 38, pp. 663–678, May 1989.
W. Y. Chen, S. A. Mahlke, W. W. Hwu, and T. Kiyohara, “Assisting compile-time code reordering with the memory conflict buffer,” tech. rep., Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, W. et al. (1993). Using profile information to assist advanced compiler optimization and scheduling. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1992. Lecture Notes in Computer Science, vol 757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57502-2_38
Download citation
DOI: https://doi.org/10.1007/3-540-57502-2_38
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57502-3
Online ISBN: 978-3-540-48201-7
eBook Packages: Springer Book Archive