Abstract
This paper presents design, development and evaluation of an eXtra-large Scale, Homogeneous and a Heterogeneous Accelerator-Rich Platform (HARP2) for massively parallel signal processing algorithms. HARP is an integrated platform of multiple Coarse-Grained Reconfigurable Arrays (CGRAs) over a Network-on-Chip (NoC) where each CGRA is scaled and tailored for a specific application. The architecture of the NoC consists of nine nodes in a topology of 3-rows × 3-columns and acts as backbone of communication between different CGRAs. In this experimental work, the HARP template is used to instantiate a homogeneous (HARP-hom) and a heterogeneous (HARP-het) platform. The HARP-het is generated for a proof-of-concept test to verify the design and functionality of HARP. It also provides insight to many features of the design and evaluation in terms of different performance metrics. The other version (HARP-hom) is instantiated for a relatively realistic design problem, i.e., satisfying the execution-time constraints imposed on Fast Fourier Transform processing in IEEE-802.11n demodulators. Both of the versions of HARP are treated for comparative analysis using different performance metrics against some of the existing state-of-the-art platforms. The HARP versions are designed to illustrate large-scale homogeneous/heterogeneous multicore architectures while presenting the advantages of maximizing the number of reconfigurable processing resources on a single chip.
Similar content being viewed by others
References
Venkatesh, G., Sampson, J., Goulding, N., Gracia, S., Bryksin, V., Martinez, J.L., Swanson, S., & Taylor, M.B. (2010). Conservation cores: reducing the energy of mature computations, ASPLOS 10.
Brunelli, C., Garzia, F., & Nurmi, J (2008). A coarse-grain reconfigurable architecture for multimedia applications featuring subword computation capabilities, Springer-Verlag. Journal of Real-Time Image Processing, 3 (1–2), 21–32. doi:10.1007/s11554-008-0071-3.
Singh, H., Lee, M.-H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., & Filho, E.M.C. (2000). Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Computers, 49(5), 465–481.
Mei, B., Vernalde, S., Verkest, D., Man, H.D., & Lauwereins, R. (2003). ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. Field-Programmable Logic and Applications, 2778, 61–70. ISBN 978-3-540-40822-2.
Baumgarte, V., Ehlers, G., May, F., Nuckel, A., Vorbach, M., & Weinhardt, M. (2003). PACT XPP-A self-reconfigurable data processing architecture. The Journal of Supercomputing, 26(2), 167–184.
Garzia, F., Hussain, W., & Nurmi, J. (2009). CREMA, a coarse-grain re-configurable array with mapping adaptiveness. In Proc. 19th international conference on Field Programmable Logic and Applications (FPL 2009). Prague, Czech Republic: IEEE.
Hussain, W., Garzia, F., Ahonen, T., & Nurmi, J. (2012). Designing fast fourier transform accelerators for orthogonal frequency-division multiplexing systems. Journal of Signal Processing Systems, Springer, 69, 161–171.
Hussain, W., Ahonen, T., & Nurmi, J. (2012). Effects of Scaling a coarse-grain reconfigurable array on power and energy consumption. In Proc. SoC 2012. Finland.
Vassiliadis, D., Kavvadias, N., Theodoridis, G., & Nikolaidis, S. (2005). A RISC architecture extended by an efficient tightly coupled reconfigurable unit. In Proc. ARC.
Hussain, W., Chen, X., Ascheid, G., & Nurmi, J. (2013). A reconfigurable application-specific instruction-set processor for fast fourier transform processing. In IEEE 24th international conference on Application-Specific Systems, Architectures and Processors (ASAP) (pp. 339–345). Washington, USA.
Garzia, F., Ahonen, T., & Nurmi, J. (2009). A switched interconnection infrastructure to tightly-couple a RISC processor core with a coarse grain reconfigurable array. Research in Microelectronics and Electronics 2009. PRIME 2009. Ph.D.,. 16–19. doi:10.1109/RME.2009.5201372.
Hussain, W., Garzia, F., & Nurmi, J. (2010). Evaluation of Radix-2 and Radix-4 FFT processing on a reconfigurable platform. In Proceedings of the 13th IEEE international symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS’10). ISBN 978-1-4244- 6610-8 (pp. 249–254). IEEE.
Garzia, F., Hussain, W., Airoldi, R., & Nurmi, J. (2009). A reconfigurable SoC tailored to software defined radio applications. In Proc of 27th Norchip Conference, Trondheim (NO).
IEEE Standard for Information technology (2009). Local and metropolitan area networks– Specific requirements– Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 5: Enhancements for Higher Throughput,” in IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 as amended by IEEE Std 802.11k- 2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-2008, and IEEE Std 802.11w-2009), vol., no., pp.1-565, Oct. 29 2009 doi:10.1109/IEEESTD.2009.5307322.
Rauwerda, G.K., Heysters, P.M., & Smit, G.J.M. (2008). Towards software defined radios using coarse-grained reconfigurable hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(1), 313.
Cooley, J.W., & Tukey, J.W. (1965). An algorithm for the machine calculation of complex Fourier series. Math. Comp., 19, 297–301.
Hussain, W., Garzia, F., & Nurmi, J. (2010). Exploiting control management to accelerate radix-4 FFT on a reconfigurable platform. In Proc. International Symposium on System-on-Chip 2010. ISBN: 978-1-4244- 8276-4 (pp. 154–157). Tampere : IEEE.
Kylliainen, J., Ahonen, T., & Nurmi, J. (2007). General-purpose embedded processor cores - the COFFEE RISC example. In J. Nurmi (ed.), Processor Design: System-on-Chip Computing for ASICs and FPGAs (ch. 5, pp. 83–100). Kluwer Academic Publishers / Springer Publishers. ISBN-10: 1402055293, ISBN-13: 978-1-4020-5529-4.
Brunelli, C., Garzia, F., Giliberto, C., & Nurmi, J. (2008). A Dedicated DMA Logic Addressing a Time Multiplexed Memory to Reduce the Effects of the System Buss Bottleneck. In Proc. 18th International Conference on Field Programmable Logic and Applications, (FPL 2008) (pp. 487–490). Germany, Heidelberg.
Garzia, F., Brunelli, C., & Nurmi, J. (2008). A pipelined infrastructure for the distribution of the configuration bitstream in a coarse-grain reconfigurable array. In Proceedings of the 4th International Workshop on Reconfigurable Communication-centric System-on-Chip (ReCoSoC’08). ISBN:978-84-691-3603-4 (pp. 188–191). Univ Montpellier II.
Hussain, W., Garzia, T. Ahonen F., & Nurmi, J. (2011). Application-driven dimensioning of a coarse-grain reconfigurable array. In Proc. NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2011) (pp. 234–239). USA.
Airoldi, R., Garzia, F., Anjum, O., & Nurmi, J. (2010). Homogeneous MPSoC as baseband signal processing engine for OFDM systems. International Symposium on System on Chip (SoC) 2010, 26–30. doi:10.1109/ISSOC.2010.5625562.
Hussain, W., Ahonen, T., & Nurmi, J. (2012). Effects of scaling a coarse-grain reconfigurable array on power and energy consumption. In Proc. SoC 2012. Tampere.
Bonnot, P., Lemonnier, F., Edelin, G., Gaillat, G., Ruch, O., & Gauget, P. Definition and SIMD implementation of a multi-processing architecture approach on FPGA. In Proc. of Design, Automation and Test in Europe (DATE ’08) (pp. 610–615). New York: ACM.
Campi, F., Deledda, A., Pizzotti, M., Ciccarelli, L., Rolandi, P., Mucci, C., Lodi, A., Vitkovski, A., & Vanzolini, L. A dynamically adaptive DSP for heterogeneous reconfigurable platforms. In Proc. of Design Automation and Test in Europe (DATE ’07) (pp. 9–14). San Jose: EDA Consortium.
Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., & Dutoit, D. Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications. In Proc. 49th Annual Design Automation Conference (DAC ’12) (pp. 1137–1142). New York: ACM.
Voros, N.S., Hubner, M., Becker, J., Khnle, M., Thomaitiv, F., Grasset, A., Brelet, P., Bonnot, P., Campi, F., Schler, E., Sahlbach, H., Whitty, S., Ernst, R., Billich, E., Tischendorf, C., Heinkel, U., Ieromnimon, F., Kritharidis, D., Schneider, A., Knaeblein, J., & Putzke-Rming, W. (2013). MORPHEUS: A heterogeneous dynamically reconfigurable platform for designing highly complex embedded systems. ACM Transactions on Embedded Computing Systems, 12(Article 70, 3), 33.
Altera Product Catalog (2015). Release Date: July 2014, Version 15.0, p. 2, www.altera.com.
Wu, X., & Gopalan, P. (2013). Xilinx Next Generation 28 nm FPGA Technology Overview, White Paper: 28nm Technology, July 23, 2013, Version 1.1.1, p. 5, www.xilinx.com.
Ian, K., & Rose, J. (2007). Measuring the gap between FPGAs and ASICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(2), 203–215. doi:10.1109/TCAD.2006.884574.
Acknowledgment
This research work is jointly conducted by the Department of Electronics and Communications Engineering, Tampere University of Technology, Finland and the Department of Computer Science, University of Chicago, Illinois, USA. It was partially funded by the Academy of Finland under contract # 258506 (DEFT: Design of a Highly-parallel Heterogeneous MP-SoC Architecture for Future Wireless Technologies) and Tampere Doctoral Programme in Information Science and Engineering, Finland. The Department of Computer Science, University of Chicago, Illinois, USA also provided the financial and on-site resources for its implementation.
The authors sincerely acknowledge Bob Bartlett, Director of the Technical Staff at the Department of Computer Science, University of Chicago, IL, USA for his consistent training, support and making available the required computing infrastructure in a short time for the implementation of this research work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hussain, W., Airoldi, R., Hoffmann, H. et al. HARP2: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms. J Sign Process Syst 85, 341–353 (2016). https://doi.org/10.1007/s11265-015-1054-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-1054-9