Compiling for Reduced Bit-Width Queue Processors

Canedo, Arquimedes; Abderazek, Ben A.; Sowa, Masahiro

doi:10.1007/s11265-008-0286-3

Compiling for Reduced Bit-Width Queue Processors

Published: 16 October 2008

Volume 59, pages 45–55, (2010)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Arquimedes Canedo¹,
Ben A. Abderazek² &
Masahiro Sowa³

119 Accesses
2 Citations
Explore all metrics

Abstract

Embedded systems are characterized by the requirement of demanding small memory footprint code. A popular architectural modification to improve code density in RISC embedded processors is to use a reduced bit-width instruction set. This approach reduces the length of the instructions to improve code size. However, having less addressable registers by the reduced instructions, these architectures suffer a slight performance degradation as more reduced instructions are required to execute a given task. On the other hand, 0-operand computers such as stack and queue machines implicitly access their source and destination operands making instructions naturally short. Queue machines offer a highly parallel computation model, unlike the stack model. This paper proposes a novel alternative for reducing code size by using a queue-based reduced instruction set while retaining the high parallelism characteristics in programs. We introduce an efficient code generation algorithm to generate programs for our reduced instruction set. Our algorithm successfully constrains the code to the reduced instruction set with the addition of only 4% extra code, in average. We show that our proposed technique is able to generate about 16% more compact code than MIPS16, 26% over ARM/Thumb, and 50% over MIPS32 code. Furthermore, we show that our compiler is able to extract about the same parallelism than fully optimized RISC code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Goudge, L., & Segars, S. (1996). Thumb: Reducing the cost of 32-bit RISC performance in portable and consumer applications. In Proceedings of COMPCON ’96 (pp. 176–181).
Kissel, K. (1997). MIPS16: High-density MIPS for the embedded market. Technical report, Silicon Graphics MIPS Group.
Renesas (2008). SuperH RISC Engine. http://www.superh.com.
Koopman, P.J. (1989). Stack computers: The new wave. Chichester: Ellis Horwood.
Google Scholar
McGhan, H., & O’Connor, M. (1998). Picojava: A direct execution engine for java bytecode. Computer, 31(10), 22–30.
Article Google Scholar
Vijaykrishnan, N. (1998). Issues in the design of a Java processor architecture. PhD thesis, University of South Florida.
Kucuk, G., Ergin, O., Ponomarev, D., & Ghose, K. (2003). Energy efficient register renaming. In Lecture notes in computer science (vol. 2799/2003, pp. 219–228), September.
Krishnaswamy, A., & Gupta, R. (2002). Profile guided selection of ARM and Thumb instructions. In ACM SIGPLAN conference on languages, compilers, and tools for embedded systems (pp. 56–64).
Halambi, A., Shrivastava, A., Biswas, P., Dutt, N., & Nicolau, A. (2002). An efficient compiler technique for code size reduction using reduced bit-width ISAs. In Proceedings of the conference on design, automation and test in Europe (p. 402).
Sheayun, L., Jaejin, L., & Min, S. (2003). Code generation for a dual instruction processor based on selective code transformation. In Lectures in computer science (pp. 33–48). New York: Springer.
Google Scholar
Kwon, Y., Ma, X., & Lee, H.J. (1999). Pare: Instruction set architecture for efficient code size reduction. Electronics Letters, 35, 2098–2099.
Article Google Scholar
Krishnaswamy, A., & Gupta, R. (2003). Enhancing the performance of 16-bit code using augmenting instructions. In Proceedings of the 2003 SIGPLAN conference on language, compiler, and tools for embedded systems (pp. 254–264).
Krishnaswamy, A. (2006). Microarchitecture and compiler techniques for dual width ISA processors. PhD thesis, University of Arizona, September.
Sowa, M., Abderazek, B., & Yoshinaga, T. (2005). Parallel queue processor architecture based on produced order computation model. Journal of Supercomputing, 32(3), 217–229, June.
Article Google Scholar
Abderazek, B., Yoshinaga, T., & Sowa, M. (2006). High-level modeling and FPGA prototyping of produced order parallel queue processor core. Journal of Supercomputing, 38(1), 3–15, October.
Article Google Scholar
Abderazek, B., Kawata, S., & Sowa, M. (2006). Design and architecture for an embedded 32-bit QueueCore. Journal of Embedded Computing, 2(2), 191–205.
Google Scholar
Canedo, A. (2006). Code generation algorithms for consumed and produced order queue machines. Master’s thesis, Tokyo, Japan: University of Electro-Communications, September.
Preiss, B., & Hamacher, C. (1985). Data flow on queue machines. In 12th Int. IEEE symposium on computer architecture (pp. 342–351).
Canedo, A., Abderazek, B., & Sowa, M. (2007). A new code generation algorithm for 2-offset producer order queue computation model. Journal of Computer Languages, Systems & Structures, 34(4), 184–194, June.
Article Google Scholar
Merrill, J. (2003). GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of GCC developers summit (pp. 171–180).
Novillo, D. (2004). Design and implementation of tree SSA. In Proceedings of GCC Developers Summit (pp. 119–130).
Heath, L.S., & Pemmaraju, S.V. (1999). Stack and queue layouts of directed acyclic graphs: part I. SIAM Journal on Computing, 28(4) 1510–1539.
Article MATH MathSciNet Google Scholar
Dujmovic, J.J., & Dujmovic, I. (1998). Evolution and evaluation of SPEC benchmarks. ACM SIGMETRICS Performance Evaluation Review, 26(3), 2–9, December.
Article Google Scholar
Aho, A.V., Sethi, R., & Ullman, J.D. (1986). Compilers principles, techniques, and tools. Redwood City: Addison Wesley.
Google Scholar
Muchnick, S.S. (1997) Advanced compiler design and implementation. San Francisco: Morgan Kaufman.
Google Scholar
Canedo, A., Abderazek, B., & Sowa, M. (2008). Quantitative evaluation of common subexpression elimination on queue machines. In Proceedings of the international symposium on parallel architectures, algorithms, and networks (I-SPAN 2008) (pp. 25–30).
Kane, G., & Heinrich, J. (1992). MIPS RISC architecture. Englewood Cliffs: Prentice Hall.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Tokyo Research Laboratory, 1623-14 Shimotsuruma, Yamato-shi, Kanagawa-ken, 242-8502, Japan
Arquimedes Canedo
University of Aizu, Aizu-Wakamatsu, Fukushima-ken, 965-8580, Japan
Ben A. Abderazek
University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-Shi, Tokyo, 182-8585, Japan
Masahiro Sowa

Authors

Arquimedes Canedo
View author publications
You can also search for this author in PubMed Google Scholar
Ben A. Abderazek
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Sowa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arquimedes Canedo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canedo, A., Abderazek, B.A. & Sowa, M. Compiling for Reduced Bit-Width Queue Processors. J Sign Process Syst Sign Image Video Technol 59, 45–55 (2010). https://doi.org/10.1007/s11265-008-0286-3

Download citation

Received: 05 February 2008
Revised: 01 August 2008
Accepted: 17 September 2008
Published: 16 October 2008
Issue Date: April 2010
DOI: https://doi.org/10.1007/s11265-008-0286-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compiling for Reduced Bit-Width Queue Processors

Abstract

Access this article

Similar content being viewed by others

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

Instruction set independent program encoding

Next-Generation Intermediate Representations for Binary Code Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Compiling for Reduced Bit-Width Queue Processors

Abstract

Access this article

Similar content being viewed by others

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

Instruction set independent program encoding

Next-Generation Intermediate Representations for Binary Code Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation