Skip to main content
Log in

Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abderazek BA, Kawata S, Sowa M (2006) Design and architecture for an embedded 32-bit QueueCore. J Embed Comput 2(2):191–205

    Google Scholar 

  2. Abderazek BA, Yoshinaga T, Sowa M (2006) High-level modeling and FPGA prototyping of produced order parallel queue processor core. J Supercomput 38(1):3–15

    Article  Google Scholar 

  3. Abderazek BA, Canedo A, Yoshinga T, Sowa M (2008) The QC-2 parallel queue processor architecture. J Parallel Distrib Comput 68(2):235–245

    Article  Google Scholar 

  4. Aho AV, Sethi R, Ullman JD (1986) Compilers principles, techniques, and tools. Addison-Wesley, Reading

    Google Scholar 

  5. Allen R, Kennedy K (2002) Optimizing compilers for modern architectures. Morgan Kaufman, San Mateo

    Google Scholar 

  6. Canedo A (2006) Code generation algorithms for consumed and produced order queue machines. Master’s thesis, University of Electro-Communications, Tokyo, Japan, September 2006

  7. Canedo A, Abderazek BA, Sowa M (2006) A GCC-based compiler for the queue register processor. In Proceedings of international workshop on modern science and technology, May 2006, pp 250–255

  8. Dujmovic JJ, Dujmovic I (1998) Evolution and evaluation of SPEC benchmarks. ACM SIGMETRICS Perform Eval Rev 26(3):2–9

    Article  Google Scholar 

  9. Fernandes M (1997) Using queues for register file organization in VLIW architectures. Technical Report ECS-CSG-29-97, University of Edinburgh

  10. Goudge L, Segars S (1996) Thumb: Reducing the cost of 32-bit RISC performance in portable and consumer applications. In Proceedings of COMPCON ’96, pp 176–181

  11. Heath LS, Pemmaraju SV (1999) Stack and queue layouts of directed acyclic graphs: Part I. SIAM J Comput 28(4):1510–1539

    Article  MathSciNet  Google Scholar 

  12. Hennessy J, Patterson D (1990) Computer architecture: a quantitative approach. Morgan Kaufman, San Mateo

    Google Scholar 

  13. Huang X, Carr S, Sweany P (2001) Loop transformations for architectures with partitioned register banks. In Proceedings of the ACM SIGPLAN workshop on languages, compilers and tools for embedded systems, pp 48–55

  14. Jang S, Carr S, Sweany P, Kuras D (1998) A code generation framework for VLIW architectures with partitioned register banks. In: Proceedings of the 3rd international conference on massively parallel computing systems

  15. Janssen J, Corporaal H (1995) Partitioned register file for TTAs. In: Proceedings of the 28th annual international symposium on microarchitecture, pp 303–312

  16. Kane G, Heinrich J (1992) MIPS RISC architecture. Prentice Hall, New York

    Google Scholar 

  17. Kessler R (1999) The alpha 21264 microprocessor. IEEE Micro 19(2):24–36

    Article  MathSciNet  Google Scholar 

  18. Kissel K (1997) MIPS16: high-density MIPS for the embedded market. Technical report, Silicon Graphics MIPS Group

  19. Kucuk G, Ergin O, Ponomarev D, Ghose K (2003) Energy efficient register renaming. In: Lecture notes in computer science, vol 2799. Springer, Berlin, pp 219–228

    Google Scholar 

  20. Lam M (1988) Software pipelining: an effective scheduling technique for VLIW machines. In: Proceedings of the ACM SIGPLAN 1988 conference on programming language design and implementation, pp 318–328

  21. Losa J, Ayguade E, Valero M (1998) Quantitative evaluation of register pressure on software pipelined loops. Int J Parallel Program 26(2):121–142

    Article  Google Scholar 

  22. Mahlke SA, Chen WY, Chang P, Hwu WW (1992) Scalar program performance on multiple-instruction-issue processors with a limited number of registers. In: Proceedings of the 25th annual Hawaii int’l conference on system sciences, pp 34–44

  23. Muchnick SS (1997) Advanced compiler design and implementation. Morgan Kaufman, San Mateo

    Google Scholar 

  24. Novillo D (2004) Design and implementation of tree SSA. In: Proceedings of GCC developers summit, pp 119–130

  25. Okamoto S, Suzuki H, Maeda A, Sowa M (1999) Design of a superscalar processor based on queue machine computation model. In: IEEE pacific rim conference on communications, computers and signal processing, pp 151–154

  26. Pinter S (1993) Register allocation with instruction scheduling. In: Proceedings of the ACM SIGPLAN 1993 conference on programming language design and implementation, pp 248–257

  27. Postiff M, Greene D, Mudge T (2000) The need for large register file in integer codes. Technical Report CSE-TR-434-00, University of Michigan

  28. Preiss B, Hamacher C (1985) Data flow on queue machines. In: 12th int IEEE symposium on computer architecture, pp 342–351

  29. Rau R (1994) Iterative modulo scheduling: an algorithm for software pipelining loops. In: Proceedings of the 27th annual international symposium on microarchitecture, pp 63–74

  30. Ravindran R, Senger R, Marsman E, Dasika G, Guthaus M, Mahlke S, Brown R (2005) Partitioning variables across register windows to reduce spill code in a low-power processor. IEEE Trans Comput 54(8):998–1012

    Article  Google Scholar 

  31. Schmit H, Levine B, Ylvisaker B (2002) Queue machines: hardware computation in hardware. In: 10th annual IEEE symposium on field-programmable custom computing machines, p 152

  32. Sparc International (1992) The SPARC architecture manual, version 8. Prentice Hall, New York

    Google Scholar 

  33. Tayson G, Smelyanskiy M, Davidson E (2001) Evaluating the use of register queues in software pipelined loops. IEEE Trans Comput 50(8):769–783

    Google Scholar 

  34. Wall D (1991) Limits of instruction-level parallelism. ACM SIGARCH Comput Archit News 19(2):176–188

    Article  Google Scholar 

  35. Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading

    Google Scholar 

  36. Zalamea J, Losa J, Ayguade E, Valero M (2004) Software and hardware techniques to optimize register file utilization in VLIW architectures. Int J Parallel Program 32(6):447–474

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Abdallah Abderazek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abderazek, B.A., Masuda, M., Canedo, A. et al. Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture. J Supercomput 57, 314–338 (2011). https://doi.org/10.1007/s11227-010-0409-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-010-0409-z

Keywords

Navigation