Abstract
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.
Similar content being viewed by others
References
Abderazek BA, Kawata S, Sowa M (2006) Design and architecture for an embedded 32-bit QueueCore. J Embed Comput 2(2):191–205
Abderazek BA, Yoshinaga T, Sowa M (2006) High-level modeling and FPGA prototyping of produced order parallel queue processor core. J Supercomput 38(1):3–15
Abderazek BA, Canedo A, Yoshinga T, Sowa M (2008) The QC-2 parallel queue processor architecture. J Parallel Distrib Comput 68(2):235–245
Aho AV, Sethi R, Ullman JD (1986) Compilers principles, techniques, and tools. Addison-Wesley, Reading
Allen R, Kennedy K (2002) Optimizing compilers for modern architectures. Morgan Kaufman, San Mateo
Canedo A (2006) Code generation algorithms for consumed and produced order queue machines. Master’s thesis, University of Electro-Communications, Tokyo, Japan, September 2006
Canedo A, Abderazek BA, Sowa M (2006) A GCC-based compiler for the queue register processor. In Proceedings of international workshop on modern science and technology, May 2006, pp 250–255
Dujmovic JJ, Dujmovic I (1998) Evolution and evaluation of SPEC benchmarks. ACM SIGMETRICS Perform Eval Rev 26(3):2–9
Fernandes M (1997) Using queues for register file organization in VLIW architectures. Technical Report ECS-CSG-29-97, University of Edinburgh
Goudge L, Segars S (1996) Thumb: Reducing the cost of 32-bit RISC performance in portable and consumer applications. In Proceedings of COMPCON ’96, pp 176–181
Heath LS, Pemmaraju SV (1999) Stack and queue layouts of directed acyclic graphs: Part I. SIAM J Comput 28(4):1510–1539
Hennessy J, Patterson D (1990) Computer architecture: a quantitative approach. Morgan Kaufman, San Mateo
Huang X, Carr S, Sweany P (2001) Loop transformations for architectures with partitioned register banks. In Proceedings of the ACM SIGPLAN workshop on languages, compilers and tools for embedded systems, pp 48–55
Jang S, Carr S, Sweany P, Kuras D (1998) A code generation framework for VLIW architectures with partitioned register banks. In: Proceedings of the 3rd international conference on massively parallel computing systems
Janssen J, Corporaal H (1995) Partitioned register file for TTAs. In: Proceedings of the 28th annual international symposium on microarchitecture, pp 303–312
Kane G, Heinrich J (1992) MIPS RISC architecture. Prentice Hall, New York
Kessler R (1999) The alpha 21264 microprocessor. IEEE Micro 19(2):24–36
Kissel K (1997) MIPS16: high-density MIPS for the embedded market. Technical report, Silicon Graphics MIPS Group
Kucuk G, Ergin O, Ponomarev D, Ghose K (2003) Energy efficient register renaming. In: Lecture notes in computer science, vol 2799. Springer, Berlin, pp 219–228
Lam M (1988) Software pipelining: an effective scheduling technique for VLIW machines. In: Proceedings of the ACM SIGPLAN 1988 conference on programming language design and implementation, pp 318–328
Losa J, Ayguade E, Valero M (1998) Quantitative evaluation of register pressure on software pipelined loops. Int J Parallel Program 26(2):121–142
Mahlke SA, Chen WY, Chang P, Hwu WW (1992) Scalar program performance on multiple-instruction-issue processors with a limited number of registers. In: Proceedings of the 25th annual Hawaii int’l conference on system sciences, pp 34–44
Muchnick SS (1997) Advanced compiler design and implementation. Morgan Kaufman, San Mateo
Novillo D (2004) Design and implementation of tree SSA. In: Proceedings of GCC developers summit, pp 119–130
Okamoto S, Suzuki H, Maeda A, Sowa M (1999) Design of a superscalar processor based on queue machine computation model. In: IEEE pacific rim conference on communications, computers and signal processing, pp 151–154
Pinter S (1993) Register allocation with instruction scheduling. In: Proceedings of the ACM SIGPLAN 1993 conference on programming language design and implementation, pp 248–257
Postiff M, Greene D, Mudge T (2000) The need for large register file in integer codes. Technical Report CSE-TR-434-00, University of Michigan
Preiss B, Hamacher C (1985) Data flow on queue machines. In: 12th int IEEE symposium on computer architecture, pp 342–351
Rau R (1994) Iterative modulo scheduling: an algorithm for software pipelining loops. In: Proceedings of the 27th annual international symposium on microarchitecture, pp 63–74
Ravindran R, Senger R, Marsman E, Dasika G, Guthaus M, Mahlke S, Brown R (2005) Partitioning variables across register windows to reduce spill code in a low-power processor. IEEE Trans Comput 54(8):998–1012
Schmit H, Levine B, Ylvisaker B (2002) Queue machines: hardware computation in hardware. In: 10th annual IEEE symposium on field-programmable custom computing machines, p 152
Sparc International (1992) The SPARC architecture manual, version 8. Prentice Hall, New York
Tayson G, Smelyanskiy M, Davidson E (2001) Evaluating the use of register queues in software pipelined loops. IEEE Trans Comput 50(8):769–783
Wall D (1991) Limits of instruction-level parallelism. ACM SIGARCH Comput Archit News 19(2):176–188
Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading
Zalamea J, Losa J, Ayguade E, Valero M (2004) Software and hardware techniques to optimize register file utilization in VLIW architectures. Int J Parallel Program 32(6):447–474
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abderazek, B.A., Masuda, M., Canedo, A. et al. Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture. J Supercomput 57, 314–338 (2011). https://doi.org/10.1007/s11227-010-0409-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-010-0409-z