Skip to main content
Log in

A computational study of heuristic and exact techniques for superblock instruction scheduling

  • Published:
Journal of Scheduling Aims and scope Submit manuscript

Abstract

Compilers perform instruction scheduling to improve the performance of code on modern computer architectures. Superblocks—a straight-line sequence of code with a single entry point and multiple possible exit points—are a commonly used scheduling region within compilers. Superblock scheduling is NP-complete, and is done suboptimally in production compilers using a greedy algorithm coupled with a heuristic. Recently, exact schedulers have also been proposed. In this paper, we perform an extensive computational study of heuristic and exact techniques for scheduling superblocks. Our study extends previous work in using a more realistic architectural model, in not assuming perfect profile information, and in systematically investigating the case where profile information is not available. Our experimental results show that heuristics can be brittle and what looks promising under idealized (but unrealistic) conditions may not be robust in practice. As well, for the case where profile information is not available, some methods clearly dominate. Notably, a much inferior method is deployed in at least one existing compiler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. See Sect. 2 for detailed definitions and explanations of terms in computer architecture and instruction scheduling.

  2. See Sect. 3 for a detailed comparison to previous experimental studies.

  3. This is especially true in the case of instruction scheduling which rarely or never leads to decreased performance. Of course, in isolation a small improvement of 2–3 % would not be worthwhile. However, instruction scheduling is just one of many optimizations that a compiler performs. Individually, most or all such optimizations will each lead to small improvements—inconsequential in themselves, but cumulatively offering a significant impact on performance.

References

  • ARM. (2008). ARM Achieves 10 billion processor milestone. Press release. Retrieved from http://www.arm.com/about/newsroom/19720.php. January 2008

  • ARM (2009). The ARM, Cortex-A9 processor. In ARM white paper. Available at: http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf. September 2009.

    Google Scholar 

  • Berube, P., & Amaral, J. N. (2006). Aestimo: A feedback-directed optimization evaluation tool. In IEEE int’l symposium on performance analysis of systems and software (pp. 251–260).

    Chapter  Google Scholar 

  • Berube, P., & Amaral, J. N. (2007). Benchmark design for robust profile-directed optimization. In 2007 SPEC workshop.

    Google Scholar 

  • Bringmann, R. A. (1995). Enhancing instruction level parallelism through compiler-controlled speculation. PhD thesis, University of Illinois at Urbana-Champaign.

  • Chakrapani, L. N., Gyllenhaal, J., Hwu, W. W., Mahlke, S. A., Palem, K. V., & Rabbah, R. M. (2005). Trimaran: an infrastructure for research in instruction-level parallelism. In Proc. of the 17th int’l workshop on languages and compilers for high performance computing (pp. 32–41).

    Chapter  Google Scholar 

  • Chekuri, C., Johnson, R., Motwani, R., Natarajan, B., Rau, B. R., & Schlansker, M. (1996). Profile-driven instruction level parallel scheduling with application to superblocks. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 58–67).

    Chapter  Google Scholar 

  • Chen, Y., Yuanjie, H., Eeckhout, L., Fursin, G., Peng, L., Temam, O., & Wu, C. (2010). Evaluating iterative optimization across 1000 data sets. In Proc. of the ACM SIGPLAN 2010 conference on programming language design and implementation (pp. 448–459).

    Chapter  Google Scholar 

  • Conte, T. M., Menezes, K. N., & Hirsch, M. A. (1996). Accurate and practical profile-driven compilation using the profile buffer. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 36–45).

    Chapter  Google Scholar 

  • Cormie, D. (2002). The ARM11 microarchitecture. In ARM white paper.

    Google Scholar 

  • Deitrich, B., & Hwu, W. (1996). Speculative hedge: regulating compile-time speculation against profile variations. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 70–79).

    Chapter  Google Scholar 

  • Eeckhout, L., Vandierendonck, H., & De Bosschere, K. (2003). Quantifying the impact of input data sets on program behavior and its applications. J. Instruction-Level Parallelism, 5, 1–33.

    Google Scholar 

  • Eichenberger, A. E., & Meleis, W. M. (1999). Balance scheduling: weighting branch tradeoffs in superblocks. In Proc. of the 32nd annual IEEE/ACM int’l symposium on microarchitecture (pp. 272–283).

    Google Scholar 

  • Fisher, J. A. (1981). Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C-30, 478–490.

    Article  Google Scholar 

  • Fisher, J. A., Faraboschi, P., & Young, C. (2005). Embedded computing: a VLIW approach to architecture, compilers, and tools. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Fisher, J. A., & Freudenberger, S. M. (1992). Predicting conditional branch directions from previous runs of a program. In Proc. of the fifth int’l conference on architectural support for programming languages and operating systems (pp. 85–95).

    Chapter  Google Scholar 

  • Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown, R. B. (2001). MiBench: a free, commercially representative embedded benchmark suite. In Proc. of the IEEE 4th annual workshop on workload characterization (pp. 3–14).

    Google Scholar 

  • Govindarajan, R. (2003). Instruction scheduling. In Y. N. Srikant & P. Shankar (Eds.), The compiler design handbook (pp. 631–687). Boca Raton: CRC Press.

    Google Scholar 

  • Gupta, R., Mehofer, E., & Zhang, Y. (2003). Profile-guided compiler optimizations. In Y. N. Srikant & P. Shankar (Eds.), The compiler design handbook (pp. 143–174). Boca Raton: CRC Press.

    Google Scholar 

  • Hank, R. E., Mahlke, S. A., Bringmann, R. A., Gyllenhaal, J. C., & Hwu, W. W. (1993). Superblock formation using static program analysis. In Proc. of the 26th annual IEEE/ACM int’l symposium on microarchitecture (pp. 247–255).

    Chapter  Google Scholar 

  • Hennessy, J., & Gross, T. (1983). Postpass code optimization of pipeline constraints. ACM Transactions on Programming Languages and Systems, 5(3), 422–448.

    Article  Google Scholar 

  • Hennessy, J., & Patterson, D. (2003). Computer architecture: a quantitative approach (3rd ed.). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Hsu, W. C., Chen, H., Yew, P. C., & Chen, D.-Y. (2002). On the predictability of program behavior using different input data sets. In Proc. of the sixth annual workshop on interaction between compilers and computer architectures (pp. 45–53).

    Google Scholar 

  • Hwu, W. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., Hank, R. E., Kiyohara, T., Haab, G. E., Holm, J. G., & Lavery, D. M. (1993). The superblock: an effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 7(1), 229–248.

    Article  Google Scholar 

  • IBM. Power ISA, Version 2.06 Revision B. Available at: http://www.power.org/resources/reading/ July, 2010.

  • Intel (2011). Basic Architecture: Vol. 1. Intel 64 and IA-32 architectures software developers manual. Available at: http://www.intel.com/Assets/PDF/manual/253665.pdf. April 2011.

    Google Scholar 

  • Intel (2006). System architecture: Vol. 2. Intel Itanium architecture software developer’s manual. Revision 2.2, 2006.

    Google Scholar 

  • Malik, A. M., Chase, M., Russell, T., & van Beek, P. (2008). An application of constraint programming to superblock instruction scheduling. In Proc. of the 14th int’l conference on principles and practice of constraint programming (pp. 97–111).

    Chapter  Google Scholar 

  • Meleis, W. M., Eichenberger, A. E., & Baev, I. D. (2001). Scheduling superblocks with bound-based branch tradeoffs. IEEE Transactions on Computers, 50(8), 784–797.

    Google Scholar 

  • Muchnick, S. (1997). Advanced compiler design and implementation. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Russell, T., Malik, A. M., Chase, M., & van Beek, P. (2009). Learning heuristics for the superblock instruction scheduling problem. IEEE Transactions on Knowledge and Data Engineering, 21(10), 1489–1502.

    Article  Google Scholar 

  • Shobaki, G., & Wilken, K. (2004). Optimal superblock scheduling using enumeration. In Proc. of the 37th annual IEEE/ACM int’l symposium on microarchitecture (pp. 283–293).

    Google Scholar 

  • Wall, D. W. (1990). Predicting program behavior using real or estimated profiles (WRL Technical Note TN-18). Digital Western Research Laboratory.

  • Wang, Z., & Rubin, N. (1998). Evaluating the importance of user-specific profiling. In Proc. of the 2nd USENIX windows NT symposium.

    Google Scholar 

  • Wu, Y. (2002). Accuracy of profile maintenance in optimizing compilers. In Proc. of the sixth annual workshop on interaction between compilers and computer architectures (pp. 27–38).

    Google Scholar 

Download references

Acknowledgements

This research was supported by an IBM Center for Advanced Studies (CAS) Fellowship, an NSERC Postgraduate Scholarship, and an NSERC CRD Grant. The experimental results were obtained using the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET). The authors are grateful to the anonymous referees for their comments, which prompted us to re-examine some of our assumptions and to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter van Beek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chase, M., Malik, A.M., Russell, T. et al. A computational study of heuristic and exact techniques for superblock instruction scheduling. J Sched 15, 743–756 (2012). https://doi.org/10.1007/s10951-012-0276-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10951-012-0276-y

Keywords

Navigation