Skip to main content
Log in

Compiling for Reduced Bit-Width Queue Processors

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Embedded systems are characterized by the requirement of demanding small memory footprint code. A popular architectural modification to improve code density in RISC embedded processors is to use a reduced bit-width instruction set. This approach reduces the length of the instructions to improve code size. However, having less addressable registers by the reduced instructions, these architectures suffer a slight performance degradation as more reduced instructions are required to execute a given task. On the other hand, 0-operand computers such as stack and queue machines implicitly access their source and destination operands making instructions naturally short. Queue machines offer a highly parallel computation model, unlike the stack model. This paper proposes a novel alternative for reducing code size by using a queue-based reduced instruction set while retaining the high parallelism characteristics in programs. We introduce an efficient code generation algorithm to generate programs for our reduced instruction set. Our algorithm successfully constrains the code to the reduced instruction set with the addition of only 4% extra code, in average. We show that our proposed technique is able to generate about 16% more compact code than MIPS16, 26% over ARM/Thumb, and 50% over MIPS32 code. Furthermore, we show that our compiler is able to extract about the same parallelism than fully optimized RISC code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

References

  1. Goudge, L., & Segars, S. (1996). Thumb: Reducing the cost of 32-bit RISC performance in portable and consumer applications. In Proceedings of COMPCON ’96 (pp. 176–181).

  2. Kissel, K. (1997). MIPS16: High-density MIPS for the embedded market. Technical report, Silicon Graphics MIPS Group.

  3. Renesas (2008). SuperH RISC Engine. http://www.superh.com.

  4. Koopman, P.J. (1989). Stack computers: The new wave. Chichester: Ellis Horwood.

    Google Scholar 

  5. McGhan, H., & O’Connor, M. (1998). Picojava: A direct execution engine for java bytecode. Computer, 31(10), 22–30.

    Article  Google Scholar 

  6. Vijaykrishnan, N. (1998). Issues in the design of a Java processor architecture. PhD thesis, University of South Florida.

  7. Kucuk, G., Ergin, O., Ponomarev, D., & Ghose, K. (2003). Energy efficient register renaming. In Lecture notes in computer science (vol. 2799/2003, pp. 219–228), September.

  8. Krishnaswamy, A., & Gupta, R. (2002). Profile guided selection of ARM and Thumb instructions. In ACM SIGPLAN conference on languages, compilers, and tools for embedded systems (pp. 56–64).

  9. Halambi, A., Shrivastava, A., Biswas, P., Dutt, N., & Nicolau, A. (2002). An efficient compiler technique for code size reduction using reduced bit-width ISAs. In Proceedings of the conference on design, automation and test in Europe (p. 402).

  10. Sheayun, L., Jaejin, L., & Min, S. (2003). Code generation for a dual instruction processor based on selective code transformation. In Lectures in computer science (pp. 33–48). New York: Springer.

    Google Scholar 

  11. Kwon, Y., Ma, X., & Lee, H.J. (1999). Pare: Instruction set architecture for efficient code size reduction. Electronics Letters, 35, 2098–2099.

    Article  Google Scholar 

  12. Krishnaswamy, A., & Gupta, R. (2003). Enhancing the performance of 16-bit code using augmenting instructions. In Proceedings of the 2003 SIGPLAN conference on language, compiler, and tools for embedded systems (pp. 254–264).

  13. Krishnaswamy, A. (2006). Microarchitecture and compiler techniques for dual width ISA processors. PhD thesis, University of Arizona, September.

  14. Sowa, M., Abderazek, B., & Yoshinaga, T. (2005). Parallel queue processor architecture based on produced order computation model. Journal of Supercomputing, 32(3), 217–229, June.

    Article  Google Scholar 

  15. Abderazek, B., Yoshinaga, T., & Sowa, M. (2006). High-level modeling and FPGA prototyping of produced order parallel queue processor core. Journal of Supercomputing, 38(1), 3–15, October.

    Article  Google Scholar 

  16. Abderazek, B., Kawata, S., & Sowa, M. (2006). Design and architecture for an embedded 32-bit QueueCore. Journal of Embedded Computing, 2(2), 191–205.

    Google Scholar 

  17. Canedo, A. (2006). Code generation algorithms for consumed and produced order queue machines. Master’s thesis, Tokyo, Japan: University of Electro-Communications, September.

  18. Preiss, B., & Hamacher, C. (1985). Data flow on queue machines. In 12th Int. IEEE symposium on computer architecture (pp. 342–351).

  19. Canedo, A., Abderazek, B., & Sowa, M. (2007). A new code generation algorithm for 2-offset producer order queue computation model. Journal of Computer Languages, Systems & Structures, 34(4), 184–194, June.

    Article  Google Scholar 

  20. Merrill, J. (2003). GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of GCC developers summit (pp. 171–180).

  21. Novillo, D. (2004). Design and implementation of tree SSA. In Proceedings of GCC Developers Summit (pp. 119–130).

  22. Heath, L.S., & Pemmaraju, S.V. (1999). Stack and queue layouts of directed acyclic graphs: part I. SIAM Journal on Computing, 28(4) 1510–1539.

    Article  MATH  MathSciNet  Google Scholar 

  23. Dujmovic, J.J., & Dujmovic, I. (1998). Evolution and evaluation of SPEC benchmarks. ACM SIGMETRICS Performance Evaluation Review, 26(3), 2–9, December.

    Article  Google Scholar 

  24. Aho, A.V., Sethi, R., & Ullman, J.D. (1986). Compilers principles, techniques, and tools. Redwood City: Addison Wesley.

    Google Scholar 

  25. Muchnick, S.S. (1997) Advanced compiler design and implementation. San Francisco: Morgan Kaufman.

    Google Scholar 

  26. Canedo, A., Abderazek, B., & Sowa, M. (2008). Quantitative evaluation of common subexpression elimination on queue machines. In Proceedings of the international symposium on parallel architectures, algorithms, and networks (I-SPAN 2008) (pp. 25–30).

  27. Kane, G., & Heinrich, J. (1992). MIPS RISC architecture. Englewood Cliffs: Prentice Hall.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arquimedes Canedo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canedo, A., Abderazek, B.A. & Sowa, M. Compiling for Reduced Bit-Width Queue Processors. J Sign Process Syst Sign Image Video Technol 59, 45–55 (2010). https://doi.org/10.1007/s11265-008-0286-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0286-3

Keywords

Navigation