Abstract
Domain specific coarse-grained reconfigurable architectures (CGRAs) have great promise for energy-efficient flexible designs for a suite of applications. Designing such a reconfigurable device for an application domain is very challenging because the needs of different applications must be carefully balanced to achieve the targeted design goals. It requires the evaluation of many potential architectural options to select an optimal solution. Exploring the design space manually would be very time consuming and may not even be feasible for very large designs. Even mapping one algorithm onto a customized architecture can require time ranging from minutes to hours. Running a full power simulation on a complete suite of benchmarks for various architectural options require several days. Finding the optimal point in a design space could require a very long time. We have designed a framework/tool that made such design space exploration (DSE) feasible. The resulting framework allows testing a family of algorithms and architectural options in minutes rather than days and can allow rapid selection of architectural choices. In this paper, we describe our DSE framework for domain specific reconfigurable computing where the needs of the application domain drive the construction of the device architecture. The framework has been developed to automate design space case studies, allowing application developers to explore architectural tradeoffs efficiently and reach solutions quickly. We selected some of the core signal processing benchmarks from the MediaBench benchmark suite and some edge-detection benchmarks from the image processing domain for our case studies. We describe two search algorithms: a stepped search algorithm motivated by our manual design studies and a more traditional gradient based optimization. Approximate energy models are developed in each case to guide the search toward a minimal energy solution. We validate our search results by comparing the architectural solutions selected by our tool to an architecture optimized manually and by performing sensitivity tests to evaluate the ability of our algorithms to find good quality minima in the design space. All selected fabric architectures were synthesized on 130 nm cell-based ASIC fabrication process from IBM. These architectures consume almost same amount of energy on average, but the gradient based approach is more general and promises to extend well to new problem domains. We expect these or similar heuristics and the overall design flow of the system to be useful for a wide range of architectures, including mesh based and other commonly used architectures for CGRAs.
Similar content being viewed by others
Notes
We note here that alternatives to this arrangement of dedicated pass gates are possible. In particular, we could provide dedicated routes in conjunction with each ALU to allow that ALU to be bypassed. However, we found such an arrangement to be expensive within the context of our design space, due to the need for additional multiplexers, and we do not consider it here. Instead, we search for the most efficient proportion of dedicated routes to provide, hence keeping the number of additional multiplexers to the minimum that provide us with energy gains vs. energy expense.
References
Monaghan S, Cowen C, Noakes PD (1993) Using fpgas to implement reconfigurable dsp architectures. In: IEE colloquium on field programmable gate arrays—technology and applications
Fawcett BK (1995) Fpgas in reconfigurable computing applications. In: WESCON
Kramberger I (1999) Dsp acceleration using a reconfigurable fpga. In: Proc of IEEE international symposium on industrial electronics
Katona M, Krajacevic Z, Teslic N, Kovacevic V (2005) Signal processing algorithms implementation with fpgas. In: 7th international conference on telecommunications in modern satellite, cable and broadcasting services 2005, vol 1, pp 127–130. doi:10.1109/TELSKS.2005.1572078
Baz M (2008) Optimization of mapping onto a flexible low-power electronic fabric architecture. PhD Dissertation, University of Pittsburgh
Levine B, Schmit H (2002) Piperench: power and performance evaluation of a programmable pipelined datapath. In: Presented at hot chips, vol 14
Levine B (2005) Kilocore: scalable, high-performance, and power efficient coarse-grained reconfigurable fabrics. In: International symposium on advanced reconfigurable systems
Mehta G, Stander J, Lucas J, Hoare RR, Hunsaker B, Jones AK (2006) A low-energy reconfigurable fabric for the supercisc architecture. J Low Power Electron 2(2):148–164
Mehta G, Stander J, Baz M, Hunsaker B, Jones AK (2009) Interconnect customization for a hardware fabric. ACM Trans Design Autom Electron Syst 14(1):11, 32 pages, doi:10.1145/1455229.1455240
Mehta G, Hoare RR, Stander J, Jones AK (2006) Design space exploration for low-power reconfigurable fabrics. In: Proc of the reconfigurable architectures workshop (RAW)
Mehta G, Stander J, Baz M, Hunsaker B, Jones AK (2007) Interconnect customization for a coarse-grained reconfigurable fabric. In: Proc of the IPDPS reconfigurable architecture workshop (RAW), pp 165.1–165.8
Mehta G, Ihrig CJ, Jones AK (2008) Reducing energy by exploring heterogeneity in a coarse-grain fabric. In: Proc of the IPDPS reconfigurable architecture workshop (RAW)
Benoit P, Sassatelli G, Torres L, Demigny D, Robert M, Cambon G (2003) Metrics for reconfigurable architectures characterization: remanence and scalability. In: IEEE IPDPS reconfigurable architecture workshop
Enzler R, Jeger T, Cottet D, Troster G (2000) High-level area and performance estimation of hardware building blocks on FPGAs. In: Field-programmable logic and applications forum on design language
Bilavarn S, Gogniat G, Philippe JL, Bossuet L (2003) Fast prototyping of reconfigurable architectures from a C program. In: IEEE symposium on circuits and systems
Zabel M, Kohler S, Zimmerling M, Preuber T, Spallek R (2005) Design space exploration of coarse-grain reconfigurable dsps. In: International conference on reconfigurable computing and FPGAs. ReConFig 2005, pp 8–15. doi:10.1109/RECONFIG.2005.15
Mehdipour F, Noori H, Zamani M, Inoue K, Murakami K (2008) Design space exploration for a coarse grain accelerator. In: Design automation conference, 2008. ASPDAC 2008. Asia and South pacific, pp 685–690. doi:10.1109/ASPDAC.2008.4484039
Shehan B, Jahr R, Uhrig S, Ungerer T (2010) Reconfigurable grid alu processor: optimization and design space exploration. In: 13th Euromicro conference on digital system design: architectures, methods and tools (DSD), 2010, pp 71–79. doi:10.1109/DSD.2010.28
Bossuet L, Gogniat G, Philippe JL (2005) Generic design space exploration for reconfigurable architectures. In: IEEE IPDPS reconfigurable architectures workshop (RAW)
Kim Y, Mahapatra R, Choi K (2010) Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture. IEEE Trans Very Large Scale Integr (VLSI) Syst 18(10):1471–1482. doi:10.1109/TVLSI.2009.2025280
Sotiropoulou CL, Nikolaidis S (2010) Design space exploration for fpga-based multiprocessing systems. In: 17th IEEE international conference on electronics, circuits, and systems (ICECS), pp 1164–1167. 2010. doi:10.1109/ICECS.2010.5724724
Irturk A, Benson B, Mirzaei S, Kastner R (2008) An fpga design space exploration tool for matrix inversion architectures. In: Symposium on application specific processors, 2008. SASP 2008, pp 42–47. doi:10.1109/SASP.2008.4570784
Karuri K, Chattopadhyay A, Chen X, Kammler D, Hao L, Leupers R, Meyr H, Ascheid G (2008) A design flow for architecture exploration and implementation of partially reconfigurable processors. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(10):1281–1294. doi:10.1109/TVLSI.2008.2002685
Chattopadhyay A, Chen X, Ishebabi H, Leupers R, Ascheid G, Meyr H (2008) High-level modelling and exploration of coarse-grained re-configurable architectures. In: Design, automation and test in Europe, 2008. DATE ’08, pp 1334–1339. doi:10.1109/DATE.2008.4484864
Bauer L, Shafique M, Henkel J (2009) Cross-architectural design space exploration tool for reconfigurable processors. In: Design, automation test in Europe conference exhibition, 2009. DATE ’09, pp 958–963
Mei B, Lambrechts A, Verkest D, Mignolet JY, Lauwereins R (2005) Architecture exploration for a reconfigurable architecture template. IEEE Des Test 22:90–101. doi:10.1109/MDT.2005.27
Bouwens F, Berekovic M, Kanstein A, Gaydadjiev G (2007) Architectural exploration of the adres coarse-grained reconfigurable array. In: Proceedings of the 3rd international conference on reconfigurable computing: architectures, tools and applications, ARC’07. Springer, Berlin, pp 1–13. http://dl.acm.org/citation.cfm?id=1764631.1764633
Sun K, Pan X, Wang J, Ping L (2007) Pad: a design space exploration model for reconfigurable systems. In: Fourth international conference on information technology, 2007, ITNG ’07, pp 964–965. doi:10.1109/ITNG.2007.146
Miramond B, Delosme JM (2005) Design space exploration for dynamically reconfigurable architectures. In: Proceedings design, automation and test in Europe, 2005, vol 1, pp 366–371. doi:10.1109/DATE.2005.118
Clark N, Blome J, Chu M, Mahlke S, Biles S, Flautner K (2005) An architecture framework for transparent instruction set customization in embedded processors. SIGARCH Comput Archit News 33(2):272–283. doi:10.1145/1080695.1069993. http://doi.acm.org/10.1145/1080695.1069993
Wirthlin MJ, Hutchings BL (1995) A dynamic instruction set computer. In: Proc of FCCM
Cong J, Fan Y, Han G, Zhang Z (2004) Application-specific instruction generation for configurable processor architectures. In: Proc of ISFPGA
Mbaye M, Belanger N, Savaria Y, Pierre S (2005) Application specific instruction-set processor generation for video processing based on loop optimization. In: International symposium on circuits and systems (ISCAS 2005). IEEE Press, New York, pp 515–3518
Mbaye M, Belanger N, Savaria Y, Pierre S (2007) A novel application-specific instruction-set processor design approach for video processing acceleration. J VLSI Signal Process Syst 47(3):297–315
Vogt T, Wehn N (2008) A reconfigurable application specific instruction set processor for convolutional and turbo decoding in a sdr environment. In: Design, automation and test in Europe, DATE 2008. IEEE Press, New York, pp 38–43
Guan X, Fei Y, Lin H (2011) Hierarchical design of an application-specific instruction set processor for high-throughput and scalable fft processing. IEEE Trans Very Large Scale Integr (VLSI) Syst PP(99):1–13. doi:10.1109/TVLSI.2011.2105512
Shen Z, He H, Zhang Y, Sun Y (2007) A video specific instruction set architecture for asip design. VLSI Des 2007(2):1–7. doi:10.1155/2007/58431
Fanucci L, Cassiano M, Saponara S, Kammler D, Witte EM, Schliebusch O, Ascheid G, Leupers R, Meyr H (2006) Asip design and synthesis for non linear filtering in image processing. In: Proceedings of the conference on design, automation and test in Europe (DATE), Leuven, Belgium. European Design and Automation Association, Grenoble, pp 233–238
Brisk P, Verma AK, Ienne P (2007) Optimal polynomial-time interprocedural register allocation for high-level synthesis and asip design. In: Proc of the international conference on computer-aided design (CCAD). IEEE Press, Piscataway, pp 172–179
Dinh Q, Chen D, Wong MDF (2008) Efficient asip design for configurable processors with fine-grained resource sharing. In: Proceedings of the international symposium on field programmable gate arrays (ISFPGA). ACM, New York, pp 99–106. http://doi.acm.org/10.1145/1344671.1344687
Mehta G, Jones A (2010) An architectural space exploration tool for domain specific reconfigurable computing. In: IEEE international symposium on parallel distributed processing, workshops and phd forum (IPDPSW), 2010, pp 1–8. doi:10.1109/IPDPSW.2010.5470735
Micheli GD (1994) Synthesis and optimization of digital circuits. McGraw-Hill, New York
Hoare R, Jones AK, Kusic D, Fazekas J, Foster J, Tung S, McCloud M (2006) Rapid VLIW processor customization for signal processing applications using combinational hardware functions. EURASIP J Appl Signal Process 46:472 (23 pages)
Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (2006) Extensible markup language (xml) 1.0 (fourth edition)—origin and goals. Tech Rep 20060816, World Wide Web Consortium
Ihrig CJ, Baz M, Stander J, Hoare RR, Norman BA, Prokopyev O, Hunsaker B, Jones AK (2008) Greedy algorithms for mapping onto a coarse-grained reconfigurable fabric. I-Tech Education and Publishing, Vienna
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mehta, G., Jones, A.K. Implementation and validation of architectural space exploration techniques for domain-specific reconfigurable computing. Des Autom Embed Syst 17, 27–51 (2013). https://doi.org/10.1007/s10617-013-9118-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-013-9118-1