Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

Kultala, Heikki; Viitanen, Timo; Jääskeläinen, Pekka; Helkala, Janne; Takala, Jarmo

doi:10.1007/s11265-015-1081-6

Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

Published: 07 December 2015

Volume 84, pages 435–446, (2016)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Heikki Kultala¹,
Timo Viitanen¹,
Pekka Jääskeläinen¹,
Janne Helkala¹ &
…
Jarmo Takala¹

296 Accesses
Explore all metrics

Abstract

Variable length encoding can considerably decrease code size in VLIW processors by reducing the number of bits wasted on encoding No Operations(NOPs). A processor may have different instruction templates where different execution slots are implicitly NOPs, but all combinations of NOPs may not be supported by the instruction templates. The efficiency of the NOP encoding can be improved by the compiler trying to place NOPs in such way that the usage of implicit NOPs is maximized. Two different methods of optimizing the use of the implicit NOP slots are evaluated: (a) prioritizing function units that have fewer implicit NOPs associated with them and (b) a post-pass to the instruction scheduler which utilizes the slack of the schedule by rescheduling operations with slack into different instruction words so that the available instruction templates are better utilized. Three different methods for selecting basic blocks to apply FU priorization on are also analyzed: always, always outside inner loops, and only outside inner loops only in basic blocks after testing where it helped to decrease code size. The post-pass optimizer alone saved an average of 2.4 % and a maximum of 10.5 % instruction memory, without performance loss. Prioritizing function units in only those basic blocks where it helped gave the best case instruction memory savings of 10.7 % and average savings of 3.0 % in exchange for an average 0.3 % slowdown. Applying both of the optimizations together gave the best case code size decrease of 12.2 % and an average of 5.4 %, while performance decreased on average by 0.1 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction set independent program encoding

Article 06 August 2015

Improvements of Instruction Scheduling

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

References

Corporaal, H., & Arnold, M. (1998). Using Transport Triggered Architectures for embedded processor design. Integrated Computer-Aided Engineering, 5(1), 19–38.
Google Scholar
Conte, T.M., Banerjia, S., Larin, S.Y., Menezes, K.N., & Sathaye, S.W. (1996). Instruction fetch mechanisms for VLIW architectures with compressed encodings. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 201–211).
Aditya, S., Mahlke, S. A., & Rau, B. R. (2000). Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats. ACM Transactions on Design Automation of Electronic Systems, 5(4), 752–773.
Article Google Scholar
Helkala, J., Viitanen, T., Kultala, H., Jääskeläinen, P., Takala, J., Zetterman, T., & Berg, H. (2014). Variable length instruction compression on transport triggered architectures. In Proceedings of the International Conference on Embedded Computing Systems: Architectures Modeling and Simulation (pp. 149–155). Samos, Greece.
Kultala, H., Viitanen, T., Jääskelainen, P., Helkala, J., & Takala, J. (2014). Compiler optimizations for code density of variable length instructions. In Proceedings of the IEEE Workshop on Signal Processing Systems (pp. 1–6).
Lee, C., Lee, J.K., & Hwang, T. (2000). Compiler optimization on instruction scheduling for low power. In Proceedings of the 13th International Symposium on System Synthesis (pp. 55–60).
Hahn, T.T., Stotzer, E., Sule, D., & Asal, M. (2008). Compilation strategies for reducing code size on a VLIW processor with variable length instructions. In Proceedings of the 3rd International Conference on High Performance Embedded Architectures and Compilers (pp. 147–160). Berlin Heidelberg: Springer-Verlag.
Chapter Google Scholar
Stotzer, E.J., & Leiss, E.L. (2012). Co-design of compiler and hardware techniques to reduce program code size on a vliw processor. CLEI Electronic Journal, 15(2), 2–2.
Google Scholar
Jee, S., & Palaniappan, K. (2002). Performance evaluation for a compressed-VLIW processor. In Proceedings of the ACM Symposium on Applied Computing (pp. 913–917).
Ros, M., & Sutton, P. (2005). A post-compilation register reassignment technique for improving hamming distance code compression. In Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (pp. 97–104).
Larin, S.Y., & Conte, M.T. (1999). Compiler-driven cached code compression schemes for embedded ilp processors. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 82–92): IEEE.
Haga, S., Webber, A., Zhang, Y., Nguyen, N., & Barua, R. (2005). Reducing code size in VLIW instruction scheduling. Journal of Embedded Computing, 1(3), 415–433.
Google Scholar
Haga, S., & Barua, R. (2001). EPIC instruction scheduling based on optimal approaches. In Proceedings of the First Annual Workshop on Explicitly Parallel Instruction Computing Architectures and Compiler Technology (pp. 22–31).
Muchnick, S.S. (1997). Advanced Compiler Design and Implementation: Morgan Kaufmann.
Hara, Y., Tomiyama, H., Honda, S., & Takada, H. (2009). Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Journal of Information Processing, 17, 242–254.
Article Google Scholar
Jääskeläinen, P., Guzma, V., Cilio, A., & Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proceedings of SPIE Multimedia on Mobile Devices (pp. 65070X–1 – 65070X–11).
Viitanen, T., Kultala, H., Jääskeläinen, P., & Takala, J. (2014). Heuristics for greedy transport triggered architecture interconnect exploration. In Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (pp. 2:1–2:7).
Fisher, J.A., Faraboschi, P., & Young, C. (2005). Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools: Elsevier.

Download references

Acknowledgments

This work was funded by Academy of Finland (funding decision 253087), Finnish Funding Agency for Technology and Innovation (project ”Parallel Acceleration”, funding decision 40115/13), and ARTEMIS Joint Undertaking under grant agreement no 621439 (ALMARVI).

Author information

Authors and Affiliations

Tampere University of Technology, Tampere, Finland
Heikki Kultala, Timo Viitanen, Pekka Jääskeläinen, Janne Helkala & Jarmo Takala

Authors

Heikki Kultala
View author publications
You can also search for this author in PubMed Google Scholar
Timo Viitanen
View author publications
You can also search for this author in PubMed Google Scholar
Pekka Jääskeläinen
View author publications
You can also search for this author in PubMed Google Scholar
Janne Helkala
View author publications
You can also search for this author in PubMed Google Scholar
Jarmo Takala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heikki Kultala.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kultala, H., Viitanen, T., Jääskeläinen, P. et al. Improving Code Density with Variable Length Encoding Aware Instruction Scheduling. J Sign Process Syst 84, 435–446 (2016). https://doi.org/10.1007/s11265-015-1081-6

Download citation

Received: 30 January 2015
Revised: 21 July 2015
Accepted: 30 October 2015
Published: 07 December 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11265-015-1081-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

Abstract

Access this article

Similar content being viewed by others

Instruction set independent program encoding

Improvements of Instruction Scheduling

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

Abstract

Access this article

Similar content being viewed by others

Instruction set independent program encoding

Improvements of Instruction Scheduling

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation