Abstract
To achieve high-performance on processors featuring ILP, most compilers apply locally a set of heuristics. This leads to a potentially high-performance on separate code fragments. Unfortunately, most optimizations also increase code size, which may lead to a global net performance loss. In this paper, we propose a Global Constraints-Driven Strategy (GCDS) for guiding code optimization. When using GCDS, the final code optimization decision is taken according to global criteria rather than local criteria. For instance, such criteria might be performance, code size, instruction cache behavior, etc. The performance/code size trade-off is a particularly important problem for embedded systems. We show how GCDS can be used to master code size while optimizing performance.
Similar content being viewed by others
REFERENCES
David F. Bacon, Susan L. Graham, and Oliver J. Sharp, Compiler transformation for high-performance computing, ACM Computing Surveys, 26(4):345–420 (December 1994).
David G. Bradlee, Susan J. Eggers, and Robert R. Henry, Integrating register allocation and instruction scheduling for RISCs, Proc. Fourth Int'l. Conf. Architectural Support Progr. Lang. Operat. Syst., pp. 122–131, Santa Clara, California (April 8-11, 1991). ACM SIGARCH, SIGPLAN, SIGOPS, and the IEEE Computer Society.
William Y. Chen, Pohua P. Chang, Thomas M. Conte, and Wen-mei W. Hwu, The effect of code expanding optimizations on instruction cache design, Trans. Computers, 42(9): 1045–1057 (September 1993).
Jack W. Davidson and Anne M. Holler, Subprogram inlining: A study of its effects on program execution time, IEEE Trans. Software Engng. 18(2):89–101 (February 1992).
Jack W. Davidson and Sanjay Jinturkar, Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation, Proc. 28th Ann. Int'l. Symp. Microarchitecture, pp. 125–132, Ann Arbor, Michigan, November 29-December 1, 1995. IEEE Computer Society TC-MICRO and ACM SIGMICRO.
Wen-mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery, The superblock: An effective technique for VLIW and superscalar compilation, J. Supercomputing, 8:229–248 (May 1993).
M. Lam, Software pipelining: An effective scheduling technique for VLIW machines, SIGPLAN Conf. Progr. Lang. Design and Implementation, Atlanta, ACM, pp. 318–328 (1988).
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. 25th Ann. Int'l. Symp. on Microarchitecture, pp. 45–54, Portland, Oregon (December 1-4).
Scott McFarling, Procedure merging with instruction caches, ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, Toronto, Canada, pp. 71–79 (June 1991).
Todd C. Mowry, Monica S. Lam, and Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Conf. Architecture Support Progr. Lang. Operat. Syst., pp. 62–73 (October 1992).
B. R. Rau, Iterative modulo scheduling: An algorithm for software pipelining loops, Proc. 27th Int'l. Symp. Microarchitecture, pp. 63–74 (December 1994).
Stanford SUIF Compiler Group, SUIF: A parallelizing and optimizing research compiler, Technical Report CSL-TR-94-620, Computer Systems Laboratory, Stanford University (May 1994).
Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, and Wen-mei W. Hwu, IMPACT: An architectural framework for multiple-instruction-issue processors, Int'l. Symp. Computer Architecture, pp. 266–275 (1991).
Jiang Wang, Andreas Krall, and M. Anton Ertl, Decomposed software pipelining with reduced register requirement, Lubomir Bic, Wim Bohm, Paraskevas Evripidou, and Jean-Luc Gaudiot, (eds.), Proc. IFIP WG 10.3 Working Conf. Parallel Architectures and Compilation Techniques, PACT'95, pp. 277–280, Limassol, Cyprus, June 27-29, 1995. ACM Press.
Digital Semiconductor, White paper: How DIGITAL FX!32 works. http://www.digital. com/semiconductor/amt/fx32/fx-white.html (September 1997).
Brian Case, Philips hopes to displace DSPs with VLIW, Microprocessor Report, pp. 12–15 (December 1994).
Franco Gasperoni, Scheduling for horizontal systems: The VLIW paradigm in perspective. Ph.D. thesis, New York University (1991).
E. Rohou, F. Bodin, A. Seznec, G. Le Fol, F. Charot, and F. Raimbault, SALTO: System for assembly-language transformation and optimization (http://www.irisa.fr/caps/Salto). Technical Report 1032, IRISA (1996).
F. Bodin and E. Rohou, D2.3a: Definition of the low-level-high-level interface language. Technical Report, Esprit Project OCEANS Deliverable (1997).
Michel Berkelaar, lp_solve software. Available at ftp://ftp.es.ele.tue.nl/pub/lp_solve.
Daniel R. Kerns and Susan J. Eggers, Balanced scheduling: Instruction scheduling when memory latency is uncertain, SIGPLAN Notices, 28(6):278–289 (June 1993). Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation.
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. 14th Ann. Workshop on Microprogramming, IEEE, pp. 183–198 (1981).
Wen-mei W. Hwu, Richard E. Hank, David M. Gallagher, Scott A. Mahlke, Daniel M. Lavery, Grant E. Haab, John C. Gyllenhaal, and David I. August, Compiler technology for future microprocessors, Proc. IEEE, 83:1625–1639 (December 1995).
James R. Goodman and Wei-Chung Hsu, Code scheduling and register allocation in large basic blocks, Int'l. Conf. Supercomputing, pp. 442–452 (1998).
Karl Olav Lillevold, H263 Software. Available at http://www.nta.no/brukere/DVC/h263_software/ (1995) Copyright © 1995 Telenor R6D.
Robert (4er@iems.nwu.edu) Fourer and John W. (ashbury@skypoint.com) Gregory, Linear Programming FAQ, World Wide Web http://www.mcs.anl.gov/home/otc/faq/ linear-programming-faq.html, Usenet sci.answers, anonymous FTP/pub/usenet/sci. answers/linear-programming-faq from rtfm.mit.edu (1997).
David W. Wall, Predicting program behavior using real or estimated profiles, Conf. Progr. Lang. Design and Implementation, pp. 59–70 (June 1991).
Steve Carr, Combining optimization for cache and instruction-level parallelism, Proc. Conf. Parallel Architectures and Compilation Techniques (PACT'96), pp. 238–247, Boston, Massachusetts (October 20-23, 1996). IEEE Computer Society Press.
Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen, Combining loop transforma-tions considering caches and scheduling, Proc. 29th Ann. Int'l. Symp. Microarchitecture, pp. 274–286, Paris, France (December 2-4, 1996). IEEE Computer Society TC-MICRO and ACM SIGMICRO.
D. A. Berson, P. Chang, R. Gupta, and M. L. Soffa, Integrating program optimizations and transformations with the scheduling of instruction level parallelism, Lecture Notes in Computer Science, 1239 (1997).
J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. Computers, pp. 478–490 (July 1981).
R. Gupta and M. L. Soffa, Region scheduling: An approach for detecting and redistributing parallelism, IEEE Trans. Software Engng. 16(4):421–431 (April 1990).
Richard E. Hank, Wen-mei W. Hwu, and B. Ramakrishna Rau, Region-based compilation: An introduction and motivation, Proc. 28th Ann. Int'l. Symp. Microarchitecture, pp. 158–168, Ann Arbor, Michigan (November 29-December 1, 1995). IEEE Computer Society TC-MICRO and ACM SIGMICRO.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rohou, E., Bodin, F., Eisenbeis, C. et al. Handling Global Constraints in Compiler Strategy. International Journal of Parallel Programming 28, 325–345 (2000). https://doi.org/10.1023/A:1007502921104
Issue Date:
DOI: https://doi.org/10.1023/A:1007502921104