Abstract
Compilers perform instruction scheduling to improve the performance of code on modern computer architectures. Superblocks—a straight-line sequence of code with a single entry point and multiple possible exit points—are a commonly used scheduling region within compilers. Superblock scheduling is NP-complete, and is done suboptimally in production compilers using a greedy algorithm coupled with a heuristic. Recently, exact schedulers have also been proposed. In this paper, we perform an extensive computational study of heuristic and exact techniques for scheduling superblocks. Our study extends previous work in using a more realistic architectural model, in not assuming perfect profile information, and in systematically investigating the case where profile information is not available. Our experimental results show that heuristics can be brittle and what looks promising under idealized (but unrealistic) conditions may not be robust in practice. As well, for the case where profile information is not available, some methods clearly dominate. Notably, a much inferior method is deployed in at least one existing compiler.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
See Sect. 2 for detailed definitions and explanations of terms in computer architecture and instruction scheduling.
See Sect. 3 for a detailed comparison to previous experimental studies.
This is especially true in the case of instruction scheduling which rarely or never leads to decreased performance. Of course, in isolation a small improvement of 2–3 % would not be worthwhile. However, instruction scheduling is just one of many optimizations that a compiler performs. Individually, most or all such optimizations will each lead to small improvements—inconsequential in themselves, but cumulatively offering a significant impact on performance.
References
ARM. (2008). ARM Achieves 10 billion processor milestone. Press release. Retrieved from http://www.arm.com/about/newsroom/19720.php. January 2008
ARM (2009). The ARM, Cortex-A9 processor. In ARM white paper. Available at: http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf. September 2009.
Berube, P., & Amaral, J. N. (2006). Aestimo: A feedback-directed optimization evaluation tool. In IEEE int’l symposium on performance analysis of systems and software (pp. 251–260).
Berube, P., & Amaral, J. N. (2007). Benchmark design for robust profile-directed optimization. In 2007 SPEC workshop.
Bringmann, R. A. (1995). Enhancing instruction level parallelism through compiler-controlled speculation. PhD thesis, University of Illinois at Urbana-Champaign.
Chakrapani, L. N., Gyllenhaal, J., Hwu, W. W., Mahlke, S. A., Palem, K. V., & Rabbah, R. M. (2005). Trimaran: an infrastructure for research in instruction-level parallelism. In Proc. of the 17th int’l workshop on languages and compilers for high performance computing (pp. 32–41).
Chekuri, C., Johnson, R., Motwani, R., Natarajan, B., Rau, B. R., & Schlansker, M. (1996). Profile-driven instruction level parallel scheduling with application to superblocks. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 58–67).
Chen, Y., Yuanjie, H., Eeckhout, L., Fursin, G., Peng, L., Temam, O., & Wu, C. (2010). Evaluating iterative optimization across 1000 data sets. In Proc. of the ACM SIGPLAN 2010 conference on programming language design and implementation (pp. 448–459).
Conte, T. M., Menezes, K. N., & Hirsch, M. A. (1996). Accurate and practical profile-driven compilation using the profile buffer. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 36–45).
Cormie, D. (2002). The ARM11 microarchitecture. In ARM white paper.
Deitrich, B., & Hwu, W. (1996). Speculative hedge: regulating compile-time speculation against profile variations. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 70–79).
Eeckhout, L., Vandierendonck, H., & De Bosschere, K. (2003). Quantifying the impact of input data sets on program behavior and its applications. J. Instruction-Level Parallelism, 5, 1–33.
Eichenberger, A. E., & Meleis, W. M. (1999). Balance scheduling: weighting branch tradeoffs in superblocks. In Proc. of the 32nd annual IEEE/ACM int’l symposium on microarchitecture (pp. 272–283).
Fisher, J. A. (1981). Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C-30, 478–490.
Fisher, J. A., Faraboschi, P., & Young, C. (2005). Embedded computing: a VLIW approach to architecture, compilers, and tools. San Mateo: Morgan Kaufmann.
Fisher, J. A., & Freudenberger, S. M. (1992). Predicting conditional branch directions from previous runs of a program. In Proc. of the fifth int’l conference on architectural support for programming languages and operating systems (pp. 85–95).
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown, R. B. (2001). MiBench: a free, commercially representative embedded benchmark suite. In Proc. of the IEEE 4th annual workshop on workload characterization (pp. 3–14).
Govindarajan, R. (2003). Instruction scheduling. In Y. N. Srikant & P. Shankar (Eds.), The compiler design handbook (pp. 631–687). Boca Raton: CRC Press.
Gupta, R., Mehofer, E., & Zhang, Y. (2003). Profile-guided compiler optimizations. In Y. N. Srikant & P. Shankar (Eds.), The compiler design handbook (pp. 143–174). Boca Raton: CRC Press.
Hank, R. E., Mahlke, S. A., Bringmann, R. A., Gyllenhaal, J. C., & Hwu, W. W. (1993). Superblock formation using static program analysis. In Proc. of the 26th annual IEEE/ACM int’l symposium on microarchitecture (pp. 247–255).
Hennessy, J., & Gross, T. (1983). Postpass code optimization of pipeline constraints. ACM Transactions on Programming Languages and Systems, 5(3), 422–448.
Hennessy, J., & Patterson, D. (2003). Computer architecture: a quantitative approach (3rd ed.). San Mateo: Morgan Kaufmann.
Hsu, W. C., Chen, H., Yew, P. C., & Chen, D.-Y. (2002). On the predictability of program behavior using different input data sets. In Proc. of the sixth annual workshop on interaction between compilers and computer architectures (pp. 45–53).
Hwu, W. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., Hank, R. E., Kiyohara, T., Haab, G. E., Holm, J. G., & Lavery, D. M. (1993). The superblock: an effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 7(1), 229–248.
IBM. Power ISA, Version 2.06 Revision B. Available at: http://www.power.org/resources/reading/ July, 2010.
Intel (2011). Basic Architecture: Vol. 1. Intel 64 and IA-32 architectures software developers manual. Available at: http://www.intel.com/Assets/PDF/manual/253665.pdf. April 2011.
Intel (2006). System architecture: Vol. 2. Intel Itanium architecture software developer’s manual. Revision 2.2, 2006.
Malik, A. M., Chase, M., Russell, T., & van Beek, P. (2008). An application of constraint programming to superblock instruction scheduling. In Proc. of the 14th int’l conference on principles and practice of constraint programming (pp. 97–111).
Meleis, W. M., Eichenberger, A. E., & Baev, I. D. (2001). Scheduling superblocks with bound-based branch tradeoffs. IEEE Transactions on Computers, 50(8), 784–797.
Muchnick, S. (1997). Advanced compiler design and implementation. San Mateo: Morgan Kaufmann.
Russell, T., Malik, A. M., Chase, M., & van Beek, P. (2009). Learning heuristics for the superblock instruction scheduling problem. IEEE Transactions on Knowledge and Data Engineering, 21(10), 1489–1502.
Shobaki, G., & Wilken, K. (2004). Optimal superblock scheduling using enumeration. In Proc. of the 37th annual IEEE/ACM int’l symposium on microarchitecture (pp. 283–293).
Wall, D. W. (1990). Predicting program behavior using real or estimated profiles (WRL Technical Note TN-18). Digital Western Research Laboratory.
Wang, Z., & Rubin, N. (1998). Evaluating the importance of user-specific profiling. In Proc. of the 2nd USENIX windows NT symposium.
Wu, Y. (2002). Accuracy of profile maintenance in optimizing compilers. In Proc. of the sixth annual workshop on interaction between compilers and computer architectures (pp. 27–38).
Acknowledgements
This research was supported by an IBM Center for Advanced Studies (CAS) Fellowship, an NSERC Postgraduate Scholarship, and an NSERC CRD Grant. The experimental results were obtained using the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET). The authors are grateful to the anonymous referees for their comments, which prompted us to re-examine some of our assumptions and to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chase, M., Malik, A.M., Russell, T. et al. A computational study of heuristic and exact techniques for superblock instruction scheduling. J Sched 15, 743–756 (2012). https://doi.org/10.1007/s10951-012-0276-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10951-012-0276-y