Abstract
Several mesh-like coarse-grained reconfigurable architectures have been devised in the last few years accompanied with their corresponding mapping flows. One of the major bottlenecks in mapping algorithms on these architectures is the limited memory access bandwidth. Only a few mapping methodologies encountered the problem of the limited bandwidth while none has explored how the performance improvements are affected, from the architectural characteristics. We study in this paper the impact that the architectural parameters have on performance speedups achieved when the PEs’ local RAMs are used for storing the variables with data reuse opportunities. The data reuse values are transferred in the internal interconnection network instead of being fetched, from external memories, in order to reduce the data transfer burden on the bus network. A novel mapping algorithm is also proposed that uses a list scheduling technique. The experimental results quantified the trade-offs that exist between the performance improvements and the memory access latency, the interconnection network and the processing element’s local RAM size. For this reason, our mapping methodology targets on a flexible architecture template, which permits such an exploration. More specifically, the experiments showed that the improvements increase with the memory access latency, while a richer interconnection topology can improve the operation parallelism by a factor of 1.4 on average. Finally, for the considered set of benchmarks, the operation parallelism has been improved from 8.6% to 85.1% from the application of our methodology, and by having each PE’s Local RAM a size of 8 words.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Hartenstein R (2001) A decade of reconfigurable computing: A visionary retrospective. In: Proc of ACM/IEEE DATE ’01, 2001, pp 642–649
Mei B, Vernalde S, Verkest D, De Man H, Lauwereins R (2003) Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In: Proc of ACM/IEEE DATE ’03, 2003, pp 255–261
Pact Corporation (2004) The XPP white paper. Technical report. www.pactcorp.com
Singh H, Ming-Hau L, Guangming L, Kurdahi FJ, Bagherzadeh N, Chaves Filho EM (2000) MorphoSys: An integrated reconfigurable system for data-parallel and communication-intensive applications. IEEE Trans Comput 49(5):465–481
Miyamori T, Olukutun K (1999) REMARC: reconfigurable multimedia array coprocessor. IEICE Trans Inf Syst 389–397
Waingold E, Taylor M, Srikrishna D et al (1997) Baring it all to software: raw machines. IEEE Comput 30(9):86–93
Mei B, Vernalde S, Verkest D, Lauwereins R (2004) Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture. A case study. In: Proc of ACM/IEEE DATE ’04, 2004, pp 1224–1229
Catthoor F, Danckaert K, Kulkarni C, Brockmeyer E, Kjeldsberg P, Achteren T, Omnes T (2002) Data accesses and storage management for embedded programmable processors. Kluwer Academic
Hartenstein RW, Kress R (1995) A datapath synthesis system for the reconfigurable datapath architecture. In: Proc of ASP-DAC, Art No 77, Sep 1995
Cardoso JMP (2002) Weinhardt M, XPP-VC: A compiler with temporal partitioning for the PACT-XPP architecture. In: Proc of field programmable logic and its applications (FPL 02), LNCS 2438, Springer, 2002, pp 864–874
Lee J, Choi K, Dutt N (2003) Compilation approach for coarse-grained reconfigurable architectures. IEEE Design Test Comput 20(1):26–33
Todman TJ, Constantinides GA, Wilton SJE, Mencer O, Luk W, Cheung PYK (2005) Reconfigurable computing: architectures and design methods. IEE Proc Comput Digit Tech 152(2):193–207
Miyamori T, Olukotun K (1998) A quantitative analysis of reconfigurable coprocessors for multimedia applications. In: IEEE symposium on fpgas for custom computing machines, 1998, pp 2–11
Borkar S, Cohn R, Cox G, Gross T, Kung HT, Lam M et al (1990) Supporting systolic and memory communication in iWarp. In: Proc 17th int’l symp. computer architecture, IEEE CS Press, Los Alamitos, Calif, 1990, pp 70–81
Shoemaker D, Honoré F, Metcalf C, Ward S (1996) NuMesh: An architecture optimized for scheduled communication. J Supercomput 285–302
Quinton P, Robert Y (1991) Systolic algorithms and architectures, Prentice Hall
Hartenstein RW, Hoffman Th, Nageldinger U (2000) Design-space exploration of low power coarse grained reconfigurable datapath array architectures. In: Proc PATMOS 2000, LNCS, 1918, 2000, pp 118–128
Venkataramani G, Najjar W, Curdahi F, Bagherzadeh N, Bohm W, Hammes J (2003) Automatic compilation to a coarse-grained reconfigurable system-on-chip. ACM Trans Embed Comput Syst 2(4):560–589
Mei B, Lambrechts A, Verkest D, Mignolet JY, Lauwereins R (2005) Architecture exploration for a reconfigurable architecture template. IEEE Design Test Comput 22(2):90–101
Kwok Z, Wilton SJE (2005) Register file architecture optimization in a coarse-grained reconfigurable architecture. In: Proc of IEEE FCCM ’05, 2005, pp 35–44
Bansal N, Gupta S, Dutt N, Nikolau A, Gupta R (2004) Network topology exploration of mesh-based coarse-grain reconfigurable architectures. In: Proc of ACM/IEEE DATE ’04, 2004, pp 474–479
Bansal N, Gupta S, Dutt N, Nikolau A, Gupta R (2004) Interconnect-aware mapping of applications to coarse-grain reconfigurable architectures. In: Proc of field programmable logic and its applications (FPL ’04), LNCS 3203, Springer, 2004, pp 891–899
Mahlke SA, Lin DC, Chen WY et al (1992) Effective compiler support for predicated execution using the hyperblock. In: Proc 25th microarchitecture, 1992, pp 45–54
Kennedy K, Allen R (2002) Optimizing compilers for modern architectures. Morgan Kauffman
Panda PR, Dutt N, Nicolau A (1999) Memory issues in embedded systems-on-chip: optimizations and exploration. Kluwer Academic
Hall MW et al (1996) Maximizing multiprocessor performance with the SUIF compiler. Comput 29:84–89
Muchnick S (1998) Advanced compiler design and implementation. Morgan Kauffman
De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill, International Editions, Singapore
Texas Instruments Inc, www.ti.com, 2005
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dimitroulakos, G., Galanis, M.D. & Goutis, C.E. Design space exploration of an optimized compiler approach for a generic reconfigurable array architecture. J Supercomput 40, 127–157 (2007). https://doi.org/10.1007/s11227-006-0016-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-0016-1