Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

Abderazek, Ben Abdallah; Masuda, Masashi; Canedo, Arquimedes; Kuroda, Kenichi

doi:10.1007/s11227-010-0409-z

Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

Published: 12 March 2010

Volume 57, pages 314–338, (2011)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ben Abdallah Abderazek¹,
Masashi Masuda¹,
Arquimedes Canedo^1,2 &
…
Kenichi Kuroda¹

127 Accesses
3 Citations
Explore all metrics

Abstract

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction set independent program encoding

Article 06 August 2015

Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

Article 07 December 2015

Investigation of RISC-V

Article 03 December 2021

References

Abderazek BA, Kawata S, Sowa M (2006) Design and architecture for an embedded 32-bit QueueCore. J Embed Comput 2(2):191–205
Google Scholar
Abderazek BA, Yoshinaga T, Sowa M (2006) High-level modeling and FPGA prototyping of produced order parallel queue processor core. J Supercomput 38(1):3–15
Article Google Scholar
Abderazek BA, Canedo A, Yoshinga T, Sowa M (2008) The QC-2 parallel queue processor architecture. J Parallel Distrib Comput 68(2):235–245
Article Google Scholar
Aho AV, Sethi R, Ullman JD (1986) Compilers principles, techniques, and tools. Addison-Wesley, Reading
Google Scholar
Allen R, Kennedy K (2002) Optimizing compilers for modern architectures. Morgan Kaufman, San Mateo
Google Scholar
Canedo A (2006) Code generation algorithms for consumed and produced order queue machines. Master’s thesis, University of Electro-Communications, Tokyo, Japan, September 2006
Canedo A, Abderazek BA, Sowa M (2006) A GCC-based compiler for the queue register processor. In Proceedings of international workshop on modern science and technology, May 2006, pp 250–255
Dujmovic JJ, Dujmovic I (1998) Evolution and evaluation of SPEC benchmarks. ACM SIGMETRICS Perform Eval Rev 26(3):2–9
Article Google Scholar
Fernandes M (1997) Using queues for register file organization in VLIW architectures. Technical Report ECS-CSG-29-97, University of Edinburgh
Goudge L, Segars S (1996) Thumb: Reducing the cost of 32-bit RISC performance in portable and consumer applications. In Proceedings of COMPCON ’96, pp 176–181
Heath LS, Pemmaraju SV (1999) Stack and queue layouts of directed acyclic graphs: Part I. SIAM J Comput 28(4):1510–1539
Article MathSciNet Google Scholar
Hennessy J, Patterson D (1990) Computer architecture: a quantitative approach. Morgan Kaufman, San Mateo
Google Scholar
Huang X, Carr S, Sweany P (2001) Loop transformations for architectures with partitioned register banks. In Proceedings of the ACM SIGPLAN workshop on languages, compilers and tools for embedded systems, pp 48–55
Jang S, Carr S, Sweany P, Kuras D (1998) A code generation framework for VLIW architectures with partitioned register banks. In: Proceedings of the 3rd international conference on massively parallel computing systems
Janssen J, Corporaal H (1995) Partitioned register file for TTAs. In: Proceedings of the 28th annual international symposium on microarchitecture, pp 303–312
Kane G, Heinrich J (1992) MIPS RISC architecture. Prentice Hall, New York
Google Scholar
Kessler R (1999) The alpha 21264 microprocessor. IEEE Micro 19(2):24–36
Article MathSciNet Google Scholar
Kissel K (1997) MIPS16: high-density MIPS for the embedded market. Technical report, Silicon Graphics MIPS Group
Kucuk G, Ergin O, Ponomarev D, Ghose K (2003) Energy efficient register renaming. In: Lecture notes in computer science, vol 2799. Springer, Berlin, pp 219–228
Google Scholar
Lam M (1988) Software pipelining: an effective scheduling technique for VLIW machines. In: Proceedings of the ACM SIGPLAN 1988 conference on programming language design and implementation, pp 318–328
Losa J, Ayguade E, Valero M (1998) Quantitative evaluation of register pressure on software pipelined loops. Int J Parallel Program 26(2):121–142
Article Google Scholar
Mahlke SA, Chen WY, Chang P, Hwu WW (1992) Scalar program performance on multiple-instruction-issue processors with a limited number of registers. In: Proceedings of the 25th annual Hawaii int’l conference on system sciences, pp 34–44
Muchnick SS (1997) Advanced compiler design and implementation. Morgan Kaufman, San Mateo
Google Scholar
Novillo D (2004) Design and implementation of tree SSA. In: Proceedings of GCC developers summit, pp 119–130
Okamoto S, Suzuki H, Maeda A, Sowa M (1999) Design of a superscalar processor based on queue machine computation model. In: IEEE pacific rim conference on communications, computers and signal processing, pp 151–154
Pinter S (1993) Register allocation with instruction scheduling. In: Proceedings of the ACM SIGPLAN 1993 conference on programming language design and implementation, pp 248–257
Postiff M, Greene D, Mudge T (2000) The need for large register file in integer codes. Technical Report CSE-TR-434-00, University of Michigan
Preiss B, Hamacher C (1985) Data flow on queue machines. In: 12th int IEEE symposium on computer architecture, pp 342–351
Rau R (1994) Iterative modulo scheduling: an algorithm for software pipelining loops. In: Proceedings of the 27th annual international symposium on microarchitecture, pp 63–74
Ravindran R, Senger R, Marsman E, Dasika G, Guthaus M, Mahlke S, Brown R (2005) Partitioning variables across register windows to reduce spill code in a low-power processor. IEEE Trans Comput 54(8):998–1012
Article Google Scholar
Schmit H, Levine B, Ylvisaker B (2002) Queue machines: hardware computation in hardware. In: 10th annual IEEE symposium on field-programmable custom computing machines, p 152
Sparc International (1992) The SPARC architecture manual, version 8. Prentice Hall, New York
Google Scholar
Tayson G, Smelyanskiy M, Davidson E (2001) Evaluating the use of register queues in software pipelined loops. IEEE Trans Comput 50(8):769–783
Google Scholar
Wall D (1991) Limits of instruction-level parallelism. ACM SIGARCH Comput Archit News 19(2):176–188
Article Google Scholar
Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading
Google Scholar
Zalamea J, Losa J, Ayguade E, Valero M (2004) Software and hardware techniques to optimize register file utilization in VLIW architectures. Int J Parallel Program 32(6):447–474
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Adaptive Systems Laboratory, The University of Aizu, Fukushima-ken, Aizu-Wakamatsu-shi, 965-8580, Japan
Ben Abdallah Abderazek, Masashi Masuda, Arquimedes Canedo & Kenichi Kuroda
IBM Tokyo Research Laboratory, 1623-14 Shimotsuruma, Yamato-shi, Kanagawa-ken, 242-8502, Japan
Arquimedes Canedo

Authors

Ben Abdallah Abderazek
View author publications
You can also search for this author inPubMed Google Scholar
Masashi Masuda
View author publications
You can also search for this author inPubMed Google Scholar
Arquimedes Canedo
View author publications
You can also search for this author inPubMed Google Scholar
Kenichi Kuroda
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ben Abdallah Abderazek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abderazek, B.A., Masuda, M., Canedo, A. et al. Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture. J Supercomput 57, 314–338 (2011). https://doi.org/10.1007/s11227-010-0409-z

Download citation

Published: 12 March 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s11227-010-0409-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Instruction set independent program encoding

Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

Investigation of RISC-V

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now