Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers

Giesemann, Florian; Gerlach, Lukas; Payá-Vayá, Guillermo

doi:10.1007/s11265-019-01493-2

Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers

Published: 17 January 2020

Volume 92, pages 655–678, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Florian Giesemann¹,
Lukas Gerlach¹ &
Guillermo Payá-Vayá ORCID: orcid.org/0000-0003-3503-8386¹

532 Accesses
6 Citations
Explore all metrics

Abstract

Code generation for VLIW processors includes several optimization problems like code optimization, instruction scheduling, and register allocation. The high complexity of these problems usually does not allow the computation of the optimal solution. Instead, optimization techniques, e.g., based on heuristics, are used to find acceptable solutions in a reasonable time. List scheduling is a well known heuristic-based microcode compaction method, that bases its scheduling decisions on weights derived from dependency analysis of the input program. Additional information and methods have to be used in order to reach better code compaction. Also, more sophisticated code optimization and register allocation support better code compaction. In this paper, evolutionary algorithms are used as dynamic heuristics in code generation, which allows dynamic adaption to the given input program and target processor configuration. Three evolutionary algorithms for operation merging, instruction scheduling, and register allocation are presented and evaluated on an exemplary image processing application, which shows different processing characteristics in the subroutines. They outperform code generation based on static heuristics and allow compilation for restricted target architectures that cannot be handled by the static heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 7

Figure 11

An Effective Framework of Program Optimization for High Performance Computing

Compiling for VLIW DSPs

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Article 09 January 2017

Notes

References

Bahuleyan, J, Nagpal, R, Srikant, YN. (2010). Integrated energy-aware cyclic and acyclic scheduling for clustered VLIW processors. In: International symposium on parallel & distributed processing, Workshops and Phd Forum (IPDPSW).
Briggs, P. (1992). Register allocation via graph coloring. PhD thesis: Rice University.
Google Scholar
Codina, JM, Sanchez, J, Gonzalez, A. (2001). A unified modulo scheduling and register allocation technique for clustered processors. In: Proceedings of international conference on parallel architectures and compilation techniques.
Eriksson, MV, Skoog, O, Kessler, CW. (2008). Optimal vs. heuristic integrated code generation for clustered VLIW architectures. In Proceedings of 11th international workshop on software & compilers for embedded systems (SCOPES ’08). New York: ACM.
Fisher, JA. (1979). Optimization of horizontal microcode within and beyond basic blocks: an application of processor scheduling with resources. Tech. rep.: NY Univ. Courant Mathem.and Comp.Lab.
Google Scholar
Gerlach, L, Marquardt, D, Payá Vayá, G, Liu, S, Weißbrich, M, Doclo, S, Blume, H. (2017). Analyzing the trade-off between power consumption and beamforming algorithm performance using a hearing aid ASIP. In: Proceedings of international conference on embedded computer systems: Architectures, Modeling and simulation.
Giesemann, F, Payá-Vayá, G., Blume, H. (2012). A Hardware/software environment for specializing dynamic reconfigurable generic vliw-simd asip architecture. In: ICT. OPEN 2012 conference.
Giesemann, F, Payá-Vayá, G., Blume, H, Limmer, M, Ritter, W. (2014). A comprehensive ASIC/FPGA prototyping environment for exploring embedded processing systems for advanced driver assistance applications. In: Proceedings of international conference on embedded computer systems: Architectures, Modeling, and simulation.
Giesemann, F, Payá-Vayá G., Gerlach, L, Blume, H, Pflug, F, von Voigt, G. (2017). Using genetic algorithm approach to reduce register file pressure during instruction scheduling. In: Proceedings of international conference on embedded computer systems: Architectures, Modeling and simulation.
Hennessy, JL, & Gross, T. (1983). Postpass code optimization of pipeline constraints. ACM Transactions on Programming Languages and Systems, 5(3), 422–448.
Article Google Scholar
Äijö, T., Jääskeläinen, P., Elomaa, T, Kultala, H, Takala, J. (2015). Integer linear programming-based scheduling for Transport Triggered Architectures. ACM Transactions on Architecture and Code Optimization, 12(4), 1–22. https://doi.org/10.1145/2845082.
Article Google Scholar
Kri, F, & Feeley, M. (2004). Genetic instruction scheduling and register allocation. In: XXIV international conference of the chilean computer science society.
Lee, C, Lee, JK, Hwang, T, Tsai, SC. (2003). Compiler optimization on VLIW instruction scheduling for low power. ACM Transaction on Design Automation of Electronic Systems, 8(2), 252–268.
Article Google Scholar
Leupers, R, & Bashford, S. (2000). Graph-based code selection techniques for embedded processors. ACM Transactions on Design Automation of Electronic Systems, 5(4), 794–814.
Article Google Scholar
Lin, YC, You, YP, Lee, JK. (2006). Register allocation for VLIW DSP processors with irregular register files. In: 12Th workshop on compilers for parallel computers (CPC 2006), Coruña, Spain.
Lorenz, M, Leupers, R, Marwedel, P, Drager, T, Fettweis, G. (2001). Low-energy DSP code generation using a genetic algorithm. In: Proceedings of the IEEE international conference on computer design: vlsi in computers and processors (ICCD), pp. 431–437. https://doi.org/10.1109/ICCD.2001.955062.
Lorenz, M, Wehmeyer, L, Dräger, T. (2002). Energy aware compilation for DSPs with SIMD instructions. In: Proceedings of the joint conference on Languages, compilers and tools for embedded systems software and compilers for embedded systems - LCTES/SCOPES ’02, ACM Press, pp. 94–101. https://doi.org/10.1145/513829.513847.
Lozano, RC, Carlsson, M, Drejhammar, F, Schulte, C. (2012). Constraint-based register allocation and instruction scheduling. In Lecture notes in computer science. https://doi.org/10.1007/978-3-642-33558-7_54 (pp. 750–766). Berlin: Springer.
Lozano, RC, Carlsson, M, Blindell, GH, Schulte, C. (2014). Combinatorial spill code optimization and ultimate coalescing. In: Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems - LCTES ’14, ACM Press, pp. 23–32. https://doi.org/10.1145/2597809.2597815.
Maldonado-Bascón, S., Lafuente-Arroyo, S, Gil-Jiménez, P., Gomez-Moreno, H, Lopez-Ferreras, F. (2007). Road-sign detection and recognition based on support vector machines. IEEE Transactions on Intelligent Transportation Systems, 8(2), 264–278. https://doi.org/10.1109/TITS.2007.895311.
Article Google Scholar
McGovern, A, Moss, E, Barto, AG. (2002). Building a basic block instruction scheduler with reinforcement learning and rollouts. Machine Learning, 49(2/3), 141–160. https://doi.org/10.1023/a:1017976211990.
Article Google Scholar
Nian, C, Yanxiang, H, Yong, C, Ximi, L, Qian, L. (2012). PSO Based instruction scheduling for low power. In: International conference on computer distributed control and intelligent environmental monitoring (CDCIEM), IEEE. https://doi.org/10.1109/cdciem.2012.129.
Nicolau, A, Potasman, R, Wang, H. (1992). Register allocation, renaming and their impact on fine-grain parallelism. In: Languages and Compilers for Parallel Computing, Springer-Verlag, pp. 218–235. https://doi.org/10.1007/bfb0038667.
Parikh, A, Kim, S, Kandemir, M, Vijaykrishnan, N, Irwin, M J. (2004). Instruction scheduling for low power. Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 37(1), 129–149. https://doi.org/10.1023/b:vlsi.0000017007.28247.f6.
Article Google Scholar
Payá-Vayá, G. (2011). Design and analysis of a generic VLIW processor for multimedia applications PhD thesis. Leibniz Universität Hannover: Institute of Microelectronic Systems.
Google Scholar
Payá-Vayá, G, Martín Langerwerf, J, Pirsch, P. (2005). Rapanui: Rapid prototyping for media processor architecture exploration. In Hämäläinen, T D, Pimentel, A D, Takala, J, Vassiliadis, S (Eds.) Embedded computer systems: architectures, modeling, and simulation, lecture notes in computer science. https://doi.org/10.1007/11512622_5, (Vol. 3553 pp. 249–265). Berlin: Springer.
Payá-Vayá, G., Martín-Langerwerf, J., Taptimthong, P., Pirsch, P. (2007). Design space exploration of media processors: A parameterized scheduler. In: 2007 international conference on embedded computer systems: Architectures, Modeling and simulation, pp. 41–49. https://doi.org/10.1109/ICSAMOS.2007.4285732.
Payá-Vayá, G., Martín Langerwerf, J., Giesemann, F., Blume, H., Pirsch, P. (2009). Instruction merging to increase parallelism in VLIW architectures. In: International symposium on system-on-chip (SOC), pp. 143–146. https://doi.org/10.1109/SOCC.2009.5335660.
Payá-Vayá, G., Martín-langerwerf, J., Pirsch, P. (2009). A multi-shared register file structure for VLIW processors. Journal of Signal Processing Systems, 58(2), 215–231. https://doi.org/10.1007/s11265-009-0355-2.
Article Google Scholar
Payá-Vayá, G, Martín-Langerwerf, J, Blume, H, Pirsch, P. (2010). A forwarding-sensitive instruction scheduling approach to reduce register file constraints in VLIW architectures. In: 21st IEEE international conference on application-specific systems, Architectures and processors (ASAP), IEEE. https://doi.org/10.1109/asap.2010.5541015.
She, D, He, Y, Mesman, B, Corporaal, H. (2012). Scheduling for register file energy minimization in explicit datapath architectures. In: Proceedings of the IEEE design, automation & test in europe conference & exhibition (DATE), pp. 388–393 . https://doi.org/10.1109/date.2012.6176502.
Su, X, Wu, H, Xue, J. (2017). An efficient WCET-aware instruction scheduling and register allocation approach for clustered VLIW processors. ACM Transactions on Embedded Computing Systems, 16(5s), 1–21. https://doi.org/10.1145/3126524.
Article Google Scholar
Wang, M, Wang, Y, Liu, D, Qin, Z, Shao, Z. (2010). Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors. Journal of Systems and Software, 83(5), 772–785. https://doi.org/10.1016/j.jss.2009.11.727.
Article Google Scholar
Wilken, K, Liu, J, Heffernan, M. (2000). Optimal instruction scheduling using integer programming. SIGPLAN Not, 35(5), 121–133. https://doi.org/10.1145/358438.349318.
Article Google Scholar
Xiao, S, & Lai, EMK. (2005). Instruction scheduling of VLIW architectures for balanced power consumption. In: Proceedings of the conference on Asia South Pacific design automation (ASP-DAC ’05), ACM Press, pp. 824–829. https://doi.org/10.1145/1120725.1121027.
Xiao, S, & Lai, EMK. (2007). VLIW instruction scheduling for minimal power variation. ACM Transactions on Architecture and Code Optimization, 4(3), 18–es. https://doi.org/10.1145/1275937.1275942.
Article Google Scholar
Yun, HS, & Kim, J. (2001). Power-aware modulo scheduling for high-performance VLIW processors. In: ISLPED ’01: proceedings of the 2001 international symposium on low power electronics and design (IEEE Cat. No.01TH8581), ACM, pp. 40–45. https://doi.org/10.1109/lpe.2001.945369.
Zeitlhofer, T, & Wess, B. (1999). Operation scheduling for parallel functional units using genetic algorithms. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No.99CH36258), vol. 4, pp. 1997–2000. https://doi.org/10.1109/ICASSP.1999.758319.
Zhong, S, Wei, J, Guo, W, Wang, Z. (2010). Instruction scheduling using genetic algorithm with taboo search for TTA-like processors. In: International conference on computer design and applications. IEEE. https://doi.org/10.1109/iccda.2010.5541389.

Download references

Acknowledgements

This work was partly funded by the German Research Council (DFG) under project number PA 2762/2-1.

Author information

Authors and Affiliations

Institute of Microelectronic Systems, Leibniz Universität Hannover, Appelstraße 4, 30167, Hannover, Germany
Florian Giesemann, Lukas Gerlach & Guillermo Payá-Vayá

Authors

Florian Giesemann
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Gerlach
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Payá-Vayá
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillermo Payá-Vayá.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Giesemann, F., Gerlach, L. & Payá-Vayá, G. Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers. J Sign Process Syst 92, 655–678 (2020). https://doi.org/10.1007/s11265-019-01493-2

Download citation

Received: 14 June 2019
Revised: 26 September 2019
Accepted: 02 October 2019
Published: 17 January 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11265-019-01493-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers

Abstract

Access this article

Similar content being viewed by others

An Effective Framework of Program Optimization for High Performance Computing

Compiling for VLIW DSPs

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evolutionary Algorithms for Instruction Scheduling, Operation Merging, and Register Allocation in VLIW Compilers

Abstract

Access this article

Similar content being viewed by others

An Effective Framework of Program Optimization for High Performance Computing

Compiling for VLIW DSPs

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation