Abstract
Using the coarser operand grain and simplified interconnection patterns, CGRA (coarse grained reconfigurable architectures) has been proven to be energy efficient in several specific domains. As we know, the speed at which the contexts are applied to a PEA (processing element array) directly determines the performance of CGRA. In this paper, the design space in CGRA is further developed from the configuration granularity perspective by one middle-grained configuration granularity—the row-based configuration mechanism (RCM). The most prominent feature of the RCM is that a large DFG (data flow graph) can be mapped onto a small array in once reconfiguration, which is carried out on a row-by-row basis. Compared with an ordinary DFG-partitioning solution, the reconfiguration time and the data transfer time are well reduced. Furthermore, the proposed RCM offers much more efficient storage for the contexts. Compared with the DFG partitioning solution, the performance is boosted from 2.6% to 57.8%, while the area penalty is only 4.79% and the power penalty is only 7.22%. The RCM has been used in one reconfigurable processor called REMUS HPA (reconfigurable multi-media system, high performance version advanced). REMUS HPA has been implemented on a 50.5 mm2 silicon with TSMC 65 nm technology. Simulation shows that 1920×1088@37 fps can be achieved for H.264 high-profile decoding when exploiting a 200 MHz working frequency. Compared with the high performance version of XPP (one commercial reconfigurable processor), the performance is 247% boosted.
Similar content being viewed by others
References
Hartenstein R. A decade of reconfigurable computing: a visionary retrospective. In: Proceedings of IEEE International Conference on Design, Automation and Test in Europe, Piscataway, 2001. 642–649
PACT Inc. White Paper of Reconfiguration on XPP-III Processor. 2006
Campi F, Deledda A, Pizzotti M, et al. A dynamically adaptive DSP for heterogeneous reconfigurable platforms. Des Autom Test Eur, 2007, 1–6
Mei B, De Sutter B, Vander Aa T, et al. Implementation of a coarse-grained reconfigurable media processor for AVC decoder. J Signal Process Syst, 2008, 51: 225–243
Ganesan M K A, Singh S, May F, et al. H.264 decoder at HD resolution on a coarse grain dynamically reconfigurable architecture. In: Proceedings of International Conference on Field Programmable Logic and Applications, Amsterdam, 2007. 467–471
Mei B, Veredas F J, Masschelein B. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In: Proceedings of International Conference on Field Programmable Logic and Applications, Tampere, 2005. 622–625
Ebeling C. The General RaPiD Architecture Description. Technical Report UW-CSE-02-06-02. University of Washington, 2002
Palkovic M, Cappelle H, Glassee M, et al. Mapping of 40 MHz MIMO SDM-OFDM baseband processing on multiprocessor SDR platform. In: Proceedings of 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, Bratislava, 2008. 1–6
Novo D, Moffat W, Derudder V, et al. Mapping a multiple antenna SDM-OFDM receiver on the ADRES coarse-grained reconfigurable processor. In: Proceedings of IEEEWorkshop on Signal Processing Systems Design and Implementation, Athens, 2005. 473–478
Ebeling C, Fisher C, Xing G, et al. Implementing an OFDM receiver on the RaPiD reconfigurable architecture. IEEE Trans Comput, 2004, 53: 1436–1448
Garcia A, Berekovic M, Aa T V. Mapping of the AES cryptographic algorithm on a coarse-grain reconfigurable array processor. In: Proceedings of International Conference on Application-Specific Systems, Architectures and Processors, Leuven, 2008. 245–250
Rossi D, Campi F, Spolzino S, et al. A heterogeneous digital signal processor for dynamically reconfigurable computing. JSSC, 2010, 45: 1615–1626
De Sutter B, Raghavan P, Lambrechts A. Coarse-Grained Reconfigurable Array Architectures. Handbook of Signal Processing Systems. New York: Springer, 2013. 553–592
Burns G, Gruijters P. Flexibility tradeoffs in SoC design for low-cost SDR. SDR Forum Technical Conference, Orlando, 2003
Burns G, Gruijters P, Huiskens J, et al. Reconfigur able accelerators enabling efficient SDR for low cost consumer devices. In: Proceedings of SDR Forum Technical Conference, Orlando, 2003
Lee M H, Singh H, Lu G, et al. Design and implementation of the MorphoSys reconfigurable computing processor. J VLSI Sig Proc Syst, 2000, 24: 147–164
Hartenstein R, Herz M, Hoffmann T, et al. Mapping applications onto reconfigurable KressArrays. In: Proceedings of the 9th International Workshop on Field Programmable Logic and Applications, Glasgow, 1999. 385–390
Compton K, Hauck S. Reconfigurable computing: a survey of systems and software. ACM Comput Surv, 2002, 34: 171–210
Banerjee S, Bozorgzadeh E, Dutt N D. Integrating physical constraints in HW-SW partitioning for architectures with partial dynamic reconfiguration. IEEE Trans VLSI Syst, 2006, 14: 1189–1202
Suzuki M, Hasegawa Y, Tuan V M, et al. A cost-effective context memory structure for dynamically reconfigurable processors. In: Parallel and Distributed Processing Symposium, 2006
Lodi A, Mucci C, Bocchi M, et al. A multi-context pipelined array for embedded systems. In: Proceedings of IEEE International Conference on Field Programmable Logic and Applications, Madrid, 2006. 1–8
Sano T, Kato M, Tsutsumi S, et al. Instruction buffer mode for multi-context dynamically reconfigurable processors. In: Proceedings of IEEE International Conference on Field Programmable Logic and Applications, Heidelberg, 2008. 215–220
Shield J, Sutton P, Machanick P. Dynamic cache switching in reconfigurable embedded systems. In: Proceedings of IEEE International Conference on Field Programmable Logic and Applications, Amsterdam, 2007. 111–116
Goldstein S C, Schmit H, Budiu M, et al. PipeRench: a reconfigurable architecture and compiler. Comput, 2000, 33: 70–77
Maestre R, Kurdahi F J, Fernandez M, et al. A framework for reconfigurable computing: task scheduling and context management. IEEE Trans VLSI Syst, 2001, 9: 858–873
Liu B, Cao P, Zhu M, et al. Reconfiguration process optimization of dynamically coarse grain reconfigurable architecture for multimedia applications. IEICE Trans Inf Syst, 2012, 95: 1858–1871
Liu X, Mei C, Cao P, et al. Date flow optimization of dynamically coarse grain reconfigurable architecture for multimedia applications. IEICE Trans Inf Syst, 2012, 95: 374–382
Xiao J, Zhang J, Zhu M, et al. Fast adaboost-based face detection system on a dynamically coarse grain reconfigurable architecture. IEICE Trans Inf Syst, 2012, 95: 392–402
Cardoso J M P, Diniz P C, Weinhardt M. Compiling for reconfigurable computing: a survey. ACM Comput Surv, 2010, 42: 13
PACT Inc. White Paper of Video Decoding on XPP-III. 2006
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, L., Wang, Y., Yin, S. et al. Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture. Sci. China Inf. Sci. 57, 1–18 (2014). https://doi.org/10.1007/s11432-013-4973-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4973-8