Skip to main content

Advertisement

Log in

Design space exploration of an optimized compiler approach for a generic reconfigurable array architecture

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Several mesh-like coarse-grained reconfigurable architectures have been devised in the last few years accompanied with their corresponding mapping flows. One of the major bottlenecks in mapping algorithms on these architectures is the limited memory access bandwidth. Only a few mapping methodologies encountered the problem of the limited bandwidth while none has explored how the performance improvements are affected, from the architectural characteristics. We study in this paper the impact that the architectural parameters have on performance speedups achieved when the PEs’ local RAMs are used for storing the variables with data reuse opportunities. The data reuse values are transferred in the internal interconnection network instead of being fetched, from external memories, in order to reduce the data transfer burden on the bus network. A novel mapping algorithm is also proposed that uses a list scheduling technique. The experimental results quantified the trade-offs that exist between the performance improvements and the memory access latency, the interconnection network and the processing element’s local RAM size. For this reason, our mapping methodology targets on a flexible architecture template, which permits such an exploration. More specifically, the experiments showed that the improvements increase with the memory access latency, while a richer interconnection topology can improve the operation parallelism by a factor of 1.4 on average. Finally, for the considered set of benchmarks, the operation parallelism has been improved from 8.6% to 85.1% from the application of our methodology, and by having each PE’s Local RAM a size of 8 words.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Hartenstein R (2001) A decade of reconfigurable computing: A visionary retrospective. In: Proc of ACM/IEEE DATE ’01, 2001, pp 642–649

  2. Mei B, Vernalde S, Verkest D, De Man H, Lauwereins R (2003) Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In: Proc of ACM/IEEE DATE ’03, 2003, pp 255–261

  3. Pact Corporation (2004) The XPP white paper. Technical report. www.pactcorp.com

  4. Singh H, Ming-Hau L, Guangming L, Kurdahi FJ, Bagherzadeh N, Chaves Filho EM (2000) MorphoSys: An integrated reconfigurable system for data-parallel and communication-intensive applications. IEEE Trans Comput 49(5):465–481

    Article  Google Scholar 

  5. Miyamori T, Olukutun K (1999) REMARC: reconfigurable multimedia array coprocessor. IEICE Trans Inf Syst 389–397

  6. Waingold E, Taylor M, Srikrishna D et al (1997) Baring it all to software: raw machines. IEEE Comput 30(9):86–93

    Google Scholar 

  7. Mei B, Vernalde S, Verkest D, Lauwereins R (2004) Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture. A case study. In: Proc of ACM/IEEE DATE ’04, 2004, pp 1224–1229

  8. Catthoor F, Danckaert K, Kulkarni C, Brockmeyer E, Kjeldsberg P, Achteren T, Omnes T (2002) Data accesses and storage management for embedded programmable processors. Kluwer Academic

  9. Hartenstein RW, Kress R (1995) A datapath synthesis system for the reconfigurable datapath architecture. In: Proc of ASP-DAC, Art No 77, Sep 1995

  10. Cardoso JMP (2002) Weinhardt M, XPP-VC: A compiler with temporal partitioning for the PACT-XPP architecture. In: Proc of field programmable logic and its applications (FPL 02), LNCS 2438, Springer, 2002, pp 864–874

  11. Lee J, Choi K, Dutt N (2003) Compilation approach for coarse-grained reconfigurable architectures. IEEE Design Test Comput 20(1):26–33

    Article  Google Scholar 

  12. Todman TJ, Constantinides GA, Wilton SJE, Mencer O, Luk W, Cheung PYK (2005) Reconfigurable computing: architectures and design methods. IEE Proc Comput Digit Tech 152(2):193–207

    Article  Google Scholar 

  13. Miyamori T, Olukotun K (1998) A quantitative analysis of reconfigurable coprocessors for multimedia applications. In: IEEE symposium on fpgas for custom computing machines, 1998, pp 2–11

  14. Borkar S, Cohn R, Cox G, Gross T, Kung HT, Lam M et al (1990) Supporting systolic and memory communication in iWarp. In: Proc 17th int’l symp. computer architecture, IEEE CS Press, Los Alamitos, Calif, 1990, pp 70–81

  15. Shoemaker D, Honoré F, Metcalf C, Ward S (1996) NuMesh: An architecture optimized for scheduled communication. J Supercomput 285–302

  16. Quinton P, Robert Y (1991) Systolic algorithms and architectures, Prentice Hall

  17. Hartenstein RW, Hoffman Th, Nageldinger U (2000) Design-space exploration of low power coarse grained reconfigurable datapath array architectures. In: Proc PATMOS 2000, LNCS, 1918, 2000, pp 118–128

  18. Venkataramani G, Najjar W, Curdahi F, Bagherzadeh N, Bohm W, Hammes J (2003) Automatic compilation to a coarse-grained reconfigurable system-on-chip. ACM Trans Embed Comput Syst 2(4):560–589

    Article  Google Scholar 

  19. Mei B, Lambrechts A, Verkest D, Mignolet JY, Lauwereins R (2005) Architecture exploration for a reconfigurable architecture template. IEEE Design Test Comput 22(2):90–101

    Article  Google Scholar 

  20. Kwok Z, Wilton SJE (2005) Register file architecture optimization in a coarse-grained reconfigurable architecture. In: Proc of IEEE FCCM ’05, 2005, pp 35–44

  21. Bansal N, Gupta S, Dutt N, Nikolau A, Gupta R (2004) Network topology exploration of mesh-based coarse-grain reconfigurable architectures. In: Proc of ACM/IEEE DATE ’04, 2004, pp 474–479

  22. Bansal N, Gupta S, Dutt N, Nikolau A, Gupta R (2004) Interconnect-aware mapping of applications to coarse-grain reconfigurable architectures. In: Proc of field programmable logic and its applications (FPL ’04), LNCS 3203, Springer, 2004, pp 891–899

  23. Mahlke SA, Lin DC, Chen WY et al (1992) Effective compiler support for predicated execution using the hyperblock. In: Proc 25th microarchitecture, 1992, pp 45–54

  24. Kennedy K, Allen R (2002) Optimizing compilers for modern architectures. Morgan Kauffman

  25. Panda PR, Dutt N, Nicolau A (1999) Memory issues in embedded systems-on-chip: optimizations and exploration. Kluwer Academic

  26. Hall MW et al (1996) Maximizing multiprocessor performance with the SUIF compiler. Comput 29:84–89

    Article  Google Scholar 

  27. Muchnick S (1998) Advanced compiler design and implementation. Morgan Kauffman

  28. De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill, International Editions, Singapore

  29. Texas Instruments Inc, www.ti.com, 2005

  30. http://www.vlsi.ee.upatras.gr/~mgalanis/DSP_codes.doc

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grigoris Dimitroulakos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dimitroulakos, G., Galanis, M.D. & Goutis, C.E. Design space exploration of an optimized compiler approach for a generic reconfigurable array architecture. J Supercomput 40, 127–157 (2007). https://doi.org/10.1007/s11227-006-0016-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-006-0016-1

Keywords