HARP2: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms

Hussain, Waqar; Airoldi, Roberto; Hoffmann, Henry; Ahonen, Tapani; Nurmi, Jari

doi:10.1007/s11265-015-1054-9

HARP²: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms

Published: 17 October 2015

Volume 85, pages 341–353, (2016)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Waqar Hussain¹,
Roberto Airoldi¹,
Henry Hoffmann²,
Tapani Ahonen¹ &
…
Jari Nurmi¹

505 Accesses
4 Citations
Explore all metrics

Abstract

This paper presents design, development and evaluation of an eXtra-large Scale, Homogeneous and a Heterogeneous Accelerator-Rich Platform (HARP²) for massively parallel signal processing algorithms. HARP is an integrated platform of multiple Coarse-Grained Reconfigurable Arrays (CGRAs) over a Network-on-Chip (NoC) where each CGRA is scaled and tailored for a specific application. The architecture of the NoC consists of nine nodes in a topology of 3-rows × 3-columns and acts as backbone of communication between different CGRAs. In this experimental work, the HARP template is used to instantiate a homogeneous (HARP-hom) and a heterogeneous (HARP-het) platform. The HARP-het is generated for a proof-of-concept test to verify the design and functionality of HARP. It also provides insight to many features of the design and evaluation in terms of different performance metrics. The other version (HARP-hom) is instantiated for a relatively realistic design problem, i.e., satisfying the execution-time constraints imposed on Fast Fourier Transform processing in IEEE-802.11n demodulators. Both of the versions of HARP are treated for comparative analysis using different performance metrics against some of the existing state-of-the-art platforms. The HARP versions are designed to illustrate large-scale homogeneous/heterogeneous multicore architectures while presenting the advantages of maximizing the number of reconfigurable processing resources on a single chip.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Massive MIMO Systems for 5G Communications

Article Open access 08 May 2021

Survey on chiplets: interface, interconnect and integration methodology

Article 31 March 2022

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

Article 13 April 2024

References

Venkatesh, G., Sampson, J., Goulding, N., Gracia, S., Bryksin, V., Martinez, J.L., Swanson, S., & Taylor, M.B. (2010). Conservation cores: reducing the energy of mature computations, ASPLOS 10.
Brunelli, C., Garzia, F., & Nurmi, J (2008). A coarse-grain reconfigurable architecture for multimedia applications featuring subword computation capabilities, Springer-Verlag. Journal of Real-Time Image Processing, 3 (1–2), 21–32. doi:10.1007/s11554-008-0071-3.
Article Google Scholar
Singh, H., Lee, M.-H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., & Filho, E.M.C. (2000). Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Computers, 49(5), 465–481.
Article Google Scholar
Mei, B., Vernalde, S., Verkest, D., Man, H.D., & Lauwereins, R. (2003). ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. Field-Programmable Logic and Applications, 2778, 61–70. ISBN 978-3-540-40822-2.
Article Google Scholar
Baumgarte, V., Ehlers, G., May, F., Nuckel, A., Vorbach, M., & Weinhardt, M. (2003). PACT XPP-A self-reconfigurable data processing architecture. The Journal of Supercomputing, 26(2), 167–184.
Article MATH Google Scholar
Garzia, F., Hussain, W., & Nurmi, J. (2009). CREMA, a coarse-grain re-configurable array with mapping adaptiveness. In Proc. 19th international conference on Field Programmable Logic and Applications (FPL 2009). Prague, Czech Republic: IEEE.
Google Scholar
Hussain, W., Garzia, F., Ahonen, T., & Nurmi, J. (2012). Designing fast fourier transform accelerators for orthogonal frequency-division multiplexing systems. Journal of Signal Processing Systems, Springer, 69, 161–171.
Article Google Scholar
Hussain, W., Ahonen, T., & Nurmi, J. (2012). Effects of Scaling a coarse-grain reconfigurable array on power and energy consumption. In Proc. SoC 2012. Finland.
Vassiliadis, D., Kavvadias, N., Theodoridis, G., & Nikolaidis, S. (2005). A RISC architecture extended by an efficient tightly coupled reconfigurable unit. In Proc. ARC.
Hussain, W., Chen, X., Ascheid, G., & Nurmi, J. (2013). A reconfigurable application-specific instruction-set processor for fast fourier transform processing. In IEEE 24th international conference on Application-Specific Systems, Architectures and Processors (ASAP) (pp. 339–345). Washington, USA.
Garzia, F., Ahonen, T., & Nurmi, J. (2009). A switched interconnection infrastructure to tightly-couple a RISC processor core with a coarse grain reconfigurable array. Research in Microelectronics and Electronics 2009. PRIME 2009. Ph.D.,. 16–19. doi:10.1109/RME.2009.5201372.
Google Scholar
Hussain, W., Garzia, F., & Nurmi, J. (2010). Evaluation of Radix-2 and Radix-4 FFT processing on a reconfigurable platform. In Proceedings of the 13th IEEE international symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS’10). ISBN 978-1-4244- 6610-8 (pp. 249–254). IEEE.
Garzia, F., Hussain, W., Airoldi, R., & Nurmi, J. (2009). A reconfigurable SoC tailored to software defined radio applications. In Proc of 27th Norchip Conference, Trondheim (NO).
IEEE Standard for Information technology (2009). Local and metropolitan area networks– Specific requirements– Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 5: Enhancements for Higher Throughput,” in IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 as amended by IEEE Std 802.11k- 2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-2008, and IEEE Std 802.11w-2009), vol., no., pp.1-565, Oct. 29 2009 doi:10.1109/IEEESTD.2009.5307322.
Rauwerda, G.K., Heysters, P.M., & Smit, G.J.M. (2008). Towards software defined radios using coarse-grained reconfigurable hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(1), 313.
Article Google Scholar
Cooley, J.W., & Tukey, J.W. (1965). An algorithm for the machine calculation of complex Fourier series. Math. Comp., 19, 297–301.
Article MathSciNet MATH Google Scholar
Hussain, W., Garzia, F., & Nurmi, J. (2010). Exploiting control management to accelerate radix-4 FFT on a reconfigurable platform. In Proc. International Symposium on System-on-Chip 2010. ISBN: 978-1-4244- 8276-4 (pp. 154–157). Tampere : IEEE.
Chapter Google Scholar
Kylliainen, J., Ahonen, T., & Nurmi, J. (2007). General-purpose embedded processor cores - the COFFEE RISC example. In J. Nurmi (ed.), Processor Design: System-on-Chip Computing for ASICs and FPGAs (ch. 5, pp. 83–100). Kluwer Academic Publishers / Springer Publishers. ISBN-10: 1402055293, ISBN-13: 978-1-4020-5529-4.
Brunelli, C., Garzia, F., Giliberto, C., & Nurmi, J. (2008). A Dedicated DMA Logic Addressing a Time Multiplexed Memory to Reduce the Effects of the System Buss Bottleneck. In Proc. 18th International Conference on Field Programmable Logic and Applications, (FPL 2008) (pp. 487–490). Germany, Heidelberg.
Garzia, F., Brunelli, C., & Nurmi, J. (2008). A pipelined infrastructure for the distribution of the configuration bitstream in a coarse-grain reconfigurable array. In Proceedings of the 4th International Workshop on Reconfigurable Communication-centric System-on-Chip (ReCoSoC’08). ISBN:978-84-691-3603-4 (pp. 188–191). Univ Montpellier II.
Hussain, W., Garzia, T. Ahonen F., & Nurmi, J. (2011). Application-driven dimensioning of a coarse-grain reconfigurable array. In Proc. NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2011) (pp. 234–239). USA.
Airoldi, R., Garzia, F., Anjum, O., & Nurmi, J. (2010). Homogeneous MPSoC as baseband signal processing engine for OFDM systems. International Symposium on System on Chip (SoC) 2010, 26–30. doi:10.1109/ISSOC.2010.5625562.
Hussain, W., Ahonen, T., & Nurmi, J. (2012). Effects of scaling a coarse-grain reconfigurable array on power and energy consumption. In Proc. SoC 2012. Tampere.
Bonnot, P., Lemonnier, F., Edelin, G., Gaillat, G., Ruch, O., & Gauget, P. Definition and SIMD implementation of a multi-processing architecture approach on FPGA. In Proc. of Design, Automation and Test in Europe (DATE ’08) (pp. 610–615). New York: ACM.
Campi, F., Deledda, A., Pizzotti, M., Ciccarelli, L., Rolandi, P., Mucci, C., Lodi, A., Vitkovski, A., & Vanzolini, L. A dynamically adaptive DSP for heterogeneous reconfigurable platforms. In Proc. of Design Automation and Test in Europe (DATE ’07) (pp. 9–14). San Jose: EDA Consortium.
Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., & Dutoit, D. Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications. In Proc. 49th Annual Design Automation Conference (DAC ’12) (pp. 1137–1142). New York: ACM.
Voros, N.S., Hubner, M., Becker, J., Khnle, M., Thomaitiv, F., Grasset, A., Brelet, P., Bonnot, P., Campi, F., Schler, E., Sahlbach, H., Whitty, S., Ernst, R., Billich, E., Tischendorf, C., Heinkel, U., Ieromnimon, F., Kritharidis, D., Schneider, A., Knaeblein, J., & Putzke-Rming, W. (2013). MORPHEUS: A heterogeneous dynamically reconfigurable platform for designing highly complex embedded systems. ACM Transactions on Embedded Computing Systems, 12(Article 70, 3), 33.
Google Scholar
Altera Product Catalog (2015). Release Date: July 2014, Version 15.0, p. 2, www.altera.com.
Wu, X., & Gopalan, P. (2013). Xilinx Next Generation 28 nm FPGA Technology Overview, White Paper: 28nm Technology, July 23, 2013, Version 1.1.1, p. 5, www.xilinx.com.
Ian, K., & Rose, J. (2007). Measuring the gap between FPGAs and ASICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(2), 203–215. doi:10.1109/TCAD.2006.884574.
Article Google Scholar

Download references

Acknowledgment

This research work is jointly conducted by the Department of Electronics and Communications Engineering, Tampere University of Technology, Finland and the Department of Computer Science, University of Chicago, Illinois, USA. It was partially funded by the Academy of Finland under contract # 258506 (DEFT: Design of a Highly-parallel Heterogeneous MP-SoC Architecture for Future Wireless Technologies) and Tampere Doctoral Programme in Information Science and Engineering, Finland. The Department of Computer Science, University of Chicago, Illinois, USA also provided the financial and on-site resources for its implementation.

The authors sincerely acknowledge Bob Bartlett, Director of the Technical Staff at the Department of Computer Science, University of Chicago, IL, USA for his consistent training, support and making available the required computing infrastructure in a short time for the implementation of this research work.

Author information

Authors and Affiliations

Department of Electronics and Communications Engineering, Tampere University of Technology, P.O. Box 527, FI-33101, Tampere, Finland
Waqar Hussain, Roberto Airoldi, Tapani Ahonen & Jari Nurmi
Department of Computer Science, The University of Chicago, Ryerson Hall 250, 1100 E. 58th Street, Chicago, IL, 60637, USA
Henry Hoffmann

Authors

Waqar Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Airoldi
View author publications
You can also search for this author in PubMed Google Scholar
Henry Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Tapani Ahonen
View author publications
You can also search for this author in PubMed Google Scholar
Jari Nurmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Waqar Hussain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hussain, W., Airoldi, R., Hoffmann, H. et al. HARP²: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms. J Sign Process Syst 85, 341–353 (2016). https://doi.org/10.1007/s11265-015-1054-9

Download citation

Received: 12 December 2014
Revised: 23 June 2015
Accepted: 21 September 2015
Published: 17 October 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11265-015-1054-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HARP²: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms

Abstract

Access this article

Similar content being viewed by others

Massive MIMO Systems for 5G Communications

Survey on chiplets: interface, interconnect and integration methodology

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HARP2: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms

Abstract

Access this article

Similar content being viewed by others

Massive MIMO Systems for 5G Communications

Survey on chiplets: interface, interconnect and integration methodology

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

HARP²: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms