A computational study of heuristic and exact techniques for superblock instruction scheduling

Chase, Michael; Malik, Abid M.; Russell, Tyrel; Oldford, R. Wayne; van Beek, Peter

doi:10.1007/s10951-012-0276-y

A computational study of heuristic and exact techniques for superblock instruction scheduling

Published: 20 June 2012

Volume 15, pages 743–756, (2012)
Cite this article

Journal of Scheduling Aims and scope Submit manuscript

Michael Chase¹,
Abid M. Malik¹,
Tyrel Russell¹,
R. Wayne Oldford² &
…
Peter van Beek¹

220 Accesses
1 Citation
6 Altmetric
Explore all metrics

Abstract

Compilers perform instruction scheduling to improve the performance of code on modern computer architectures. Superblocks—a straight-line sequence of code with a single entry point and multiple possible exit points—are a commonly used scheduling region within compilers. Superblock scheduling is NP-complete, and is done suboptimally in production compilers using a greedy algorithm coupled with a heuristic. Recently, exact schedulers have also been proposed. In this paper, we perform an extensive computational study of heuristic and exact techniques for scheduling superblocks. Our study extends previous work in using a more realistic architectural model, in not assuming perfect profile information, and in systematically investigating the case where profile information is not available. Our experimental results show that heuristics can be brittle and what looks promising under idealized (but unrealistic) conditions may not be robust in practice. As well, for the case where profile information is not available, some methods clearly dominate. Notably, a much inferior method is deployed in at least one existing compiler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction Code Selection

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

See Sect. 2 for detailed definitions and explanations of terms in computer architecture and instruction scheduling.
See Sect. 3 for a detailed comparison to previous experimental studies.
This is especially true in the case of instruction scheduling which rarely or never leads to decreased performance. Of course, in isolation a small improvement of 2–3 % would not be worthwhile. However, instruction scheduling is just one of many optimizations that a compiler performs. Individually, most or all such optimizations will each lead to small improvements—inconsequential in themselves, but cumulatively offering a significant impact on performance.

References

ARM. (2008). ARM Achieves 10 billion processor milestone. Press release. Retrieved from http://www.arm.com/about/newsroom/19720.php. January 2008
ARM (2009). The ARM, Cortex-A9 processor. In ARM white paper. Available at: http://www.arm.com/files/pdf/ARMCortexA-9Processors.pdf. September 2009.
Google Scholar
Berube, P., & Amaral, J. N. (2006). Aestimo: A feedback-directed optimization evaluation tool. In IEEE int’l symposium on performance analysis of systems and software (pp. 251–260).
Chapter Google Scholar
Berube, P., & Amaral, J. N. (2007). Benchmark design for robust profile-directed optimization. In 2007 SPEC workshop.
Google Scholar
Bringmann, R. A. (1995). Enhancing instruction level parallelism through compiler-controlled speculation. PhD thesis, University of Illinois at Urbana-Champaign.
Chakrapani, L. N., Gyllenhaal, J., Hwu, W. W., Mahlke, S. A., Palem, K. V., & Rabbah, R. M. (2005). Trimaran: an infrastructure for research in instruction-level parallelism. In Proc. of the 17th int’l workshop on languages and compilers for high performance computing (pp. 32–41).
Chapter Google Scholar
Chekuri, C., Johnson, R., Motwani, R., Natarajan, B., Rau, B. R., & Schlansker, M. (1996). Profile-driven instruction level parallel scheduling with application to superblocks. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 58–67).
Chapter Google Scholar
Chen, Y., Yuanjie, H., Eeckhout, L., Fursin, G., Peng, L., Temam, O., & Wu, C. (2010). Evaluating iterative optimization across 1000 data sets. In Proc. of the ACM SIGPLAN 2010 conference on programming language design and implementation (pp. 448–459).
Chapter Google Scholar
Conte, T. M., Menezes, K. N., & Hirsch, M. A. (1996). Accurate and practical profile-driven compilation using the profile buffer. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 36–45).
Chapter Google Scholar
Cormie, D. (2002). The ARM11 microarchitecture. In ARM white paper.
Google Scholar
Deitrich, B., & Hwu, W. (1996). Speculative hedge: regulating compile-time speculation against profile variations. In Proc. of the 29th annual IEEE/ACM int’l symposium on microarchitecture (pp. 70–79).
Chapter Google Scholar
Eeckhout, L., Vandierendonck, H., & De Bosschere, K. (2003). Quantifying the impact of input data sets on program behavior and its applications. J. Instruction-Level Parallelism, 5, 1–33.
Google Scholar
Eichenberger, A. E., & Meleis, W. M. (1999). Balance scheduling: weighting branch tradeoffs in superblocks. In Proc. of the 32nd annual IEEE/ACM int’l symposium on microarchitecture (pp. 272–283).
Google Scholar
Fisher, J. A. (1981). Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C-30, 478–490.
Article Google Scholar
Fisher, J. A., Faraboschi, P., & Young, C. (2005). Embedded computing: a VLIW approach to architecture, compilers, and tools. San Mateo: Morgan Kaufmann.
Google Scholar
Fisher, J. A., & Freudenberger, S. M. (1992). Predicting conditional branch directions from previous runs of a program. In Proc. of the fifth int’l conference on architectural support for programming languages and operating systems (pp. 85–95).
Chapter Google Scholar
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown, R. B. (2001). MiBench: a free, commercially representative embedded benchmark suite. In Proc. of the IEEE 4th annual workshop on workload characterization (pp. 3–14).
Google Scholar
Govindarajan, R. (2003). Instruction scheduling. In Y. N. Srikant & P. Shankar (Eds.), The compiler design handbook (pp. 631–687). Boca Raton: CRC Press.
Google Scholar
Gupta, R., Mehofer, E., & Zhang, Y. (2003). Profile-guided compiler optimizations. In Y. N. Srikant & P. Shankar (Eds.), The compiler design handbook (pp. 143–174). Boca Raton: CRC Press.
Google Scholar
Hank, R. E., Mahlke, S. A., Bringmann, R. A., Gyllenhaal, J. C., & Hwu, W. W. (1993). Superblock formation using static program analysis. In Proc. of the 26th annual IEEE/ACM int’l symposium on microarchitecture (pp. 247–255).
Chapter Google Scholar
Hennessy, J., & Gross, T. (1983). Postpass code optimization of pipeline constraints. ACM Transactions on Programming Languages and Systems, 5(3), 422–448.
Article Google Scholar
Hennessy, J., & Patterson, D. (2003). Computer architecture: a quantitative approach (3rd ed.). San Mateo: Morgan Kaufmann.
Google Scholar
Hsu, W. C., Chen, H., Yew, P. C., & Chen, D.-Y. (2002). On the predictability of program behavior using different input data sets. In Proc. of the sixth annual workshop on interaction between compilers and computer architectures (pp. 45–53).
Google Scholar
Hwu, W. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., Hank, R. E., Kiyohara, T., Haab, G. E., Holm, J. G., & Lavery, D. M. (1993). The superblock: an effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 7(1), 229–248.
Article Google Scholar
IBM. Power ISA, Version 2.06 Revision B. Available at: http://www.power.org/resources/reading/ July, 2010.
Intel (2011). Basic Architecture: Vol. 1. Intel 64 and IA-32 architectures software developers manual. Available at: http://www.intel.com/Assets/PDF/manual/253665.pdf. April 2011.
Google Scholar
Intel (2006). System architecture: Vol. 2. Intel Itanium architecture software developer’s manual. Revision 2.2, 2006.
Google Scholar
Malik, A. M., Chase, M., Russell, T., & van Beek, P. (2008). An application of constraint programming to superblock instruction scheduling. In Proc. of the 14th int’l conference on principles and practice of constraint programming (pp. 97–111).
Chapter Google Scholar
Meleis, W. M., Eichenberger, A. E., & Baev, I. D. (2001). Scheduling superblocks with bound-based branch tradeoffs. IEEE Transactions on Computers, 50(8), 784–797.
Google Scholar
Muchnick, S. (1997). Advanced compiler design and implementation. San Mateo: Morgan Kaufmann.
Google Scholar
Russell, T., Malik, A. M., Chase, M., & van Beek, P. (2009). Learning heuristics for the superblock instruction scheduling problem. IEEE Transactions on Knowledge and Data Engineering, 21(10), 1489–1502.
Article Google Scholar
Shobaki, G., & Wilken, K. (2004). Optimal superblock scheduling using enumeration. In Proc. of the 37th annual IEEE/ACM int’l symposium on microarchitecture (pp. 283–293).
Google Scholar
Wall, D. W. (1990). Predicting program behavior using real or estimated profiles (WRL Technical Note TN-18). Digital Western Research Laboratory.
Wang, Z., & Rubin, N. (1998). Evaluating the importance of user-specific profiling. In Proc. of the 2nd USENIX windows NT symposium.
Google Scholar
Wu, Y. (2002). Accuracy of profile maintenance in optimizing compilers. In Proc. of the sixth annual workshop on interaction between compilers and computer architectures (pp. 27–38).
Google Scholar

Download references

Acknowledgements

This research was supported by an IBM Center for Advanced Studies (CAS) Fellowship, an NSERC Postgraduate Scholarship, and an NSERC CRD Grant. The experimental results were obtained using the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET). The authors are grateful to the anonymous referees for their comments, which prompted us to re-examine some of our assumptions and to improve the paper.

Author information

Authors and Affiliations

Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada
Michael Chase, Abid M. Malik, Tyrel Russell & Peter van Beek
Department of Statistics & Actuarial Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
R. Wayne Oldford

Authors

Michael Chase
View author publications
You can also search for this author in PubMed Google Scholar
Abid M. Malik
View author publications
You can also search for this author in PubMed Google Scholar
Tyrel Russell
View author publications
You can also search for this author in PubMed Google Scholar
R. Wayne Oldford
View author publications
You can also search for this author in PubMed Google Scholar
Peter van Beek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter van Beek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chase, M., Malik, A.M., Russell, T. et al. A computational study of heuristic and exact techniques for superblock instruction scheduling. J Sched 15, 743–756 (2012). https://doi.org/10.1007/s10951-012-0276-y

Download citation

Published: 20 June 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10951-012-0276-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A computational study of heuristic and exact techniques for superblock instruction scheduling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Instruction Code Selection

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now