Skip to main content
Log in

HARP2: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper presents design, development and evaluation of an eXtra-large Scale, Homogeneous and a Heterogeneous Accelerator-Rich Platform (HARP2) for massively parallel signal processing algorithms. HARP is an integrated platform of multiple Coarse-Grained Reconfigurable Arrays (CGRAs) over a Network-on-Chip (NoC) where each CGRA is scaled and tailored for a specific application. The architecture of the NoC consists of nine nodes in a topology of 3-rows × 3-columns and acts as backbone of communication between different CGRAs. In this experimental work, the HARP template is used to instantiate a homogeneous (HARP-hom) and a heterogeneous (HARP-het) platform. The HARP-het is generated for a proof-of-concept test to verify the design and functionality of HARP. It also provides insight to many features of the design and evaluation in terms of different performance metrics. The other version (HARP-hom) is instantiated for a relatively realistic design problem, i.e., satisfying the execution-time constraints imposed on Fast Fourier Transform processing in IEEE-802.11n demodulators. Both of the versions of HARP are treated for comparative analysis using different performance metrics against some of the existing state-of-the-art platforms. The HARP versions are designed to illustrate large-scale homogeneous/heterogeneous multicore architectures while presenting the advantages of maximizing the number of reconfigurable processing resources on a single chip.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  1. Venkatesh, G., Sampson, J., Goulding, N., Gracia, S., Bryksin, V., Martinez, J.L., Swanson, S., & Taylor, M.B. (2010). Conservation cores: reducing the energy of mature computations, ASPLOS 10.

  2. Brunelli, C., Garzia, F., & Nurmi, J (2008). A coarse-grain reconfigurable architecture for multimedia applications featuring subword computation capabilities, Springer-Verlag. Journal of Real-Time Image Processing, 3 (1–2), 21–32. doi:10.1007/s11554-008-0071-3.

    Article  Google Scholar 

  3. Singh, H., Lee, M.-H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., & Filho, E.M.C. (2000). Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Computers, 49(5), 465–481.

    Article  Google Scholar 

  4. Mei, B., Vernalde, S., Verkest, D., Man, H.D., & Lauwereins, R. (2003). ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. Field-Programmable Logic and Applications, 2778, 61–70. ISBN 978-3-540-40822-2.

    Article  Google Scholar 

  5. Baumgarte, V., Ehlers, G., May, F., Nuckel, A., Vorbach, M., & Weinhardt, M. (2003). PACT XPP-A self-reconfigurable data processing architecture. The Journal of Supercomputing, 26(2), 167–184.

    Article  MATH  Google Scholar 

  6. Garzia, F., Hussain, W., & Nurmi, J. (2009). CREMA, a coarse-grain re-configurable array with mapping adaptiveness. In Proc. 19th international conference on Field Programmable Logic and Applications (FPL 2009). Prague, Czech Republic: IEEE.

    Google Scholar 

  7. Hussain, W., Garzia, F., Ahonen, T., & Nurmi, J. (2012). Designing fast fourier transform accelerators for orthogonal frequency-division multiplexing systems. Journal of Signal Processing Systems, Springer, 69, 161–171.

    Article  Google Scholar 

  8. Hussain, W., Ahonen, T., & Nurmi, J. (2012). Effects of Scaling a coarse-grain reconfigurable array on power and energy consumption. In Proc. SoC 2012. Finland.

  9. Vassiliadis, D., Kavvadias, N., Theodoridis, G., & Nikolaidis, S. (2005). A RISC architecture extended by an efficient tightly coupled reconfigurable unit. In Proc. ARC.

  10. Hussain, W., Chen, X., Ascheid, G., & Nurmi, J. (2013). A reconfigurable application-specific instruction-set processor for fast fourier transform processing. In IEEE 24th international conference on Application-Specific Systems, Architectures and Processors (ASAP) (pp. 339–345). Washington, USA.

  11. Garzia, F., Ahonen, T., & Nurmi, J. (2009). A switched interconnection infrastructure to tightly-couple a RISC processor core with a coarse grain reconfigurable array. Research in Microelectronics and Electronics 2009. PRIME 2009. Ph.D.,. 16–19. doi:10.1109/RME.2009.5201372.

    Google Scholar 

  12. Hussain, W., Garzia, F., & Nurmi, J. (2010). Evaluation of Radix-2 and Radix-4 FFT processing on a reconfigurable platform. In Proceedings of the 13th IEEE international symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS’10). ISBN 978-1-4244- 6610-8 (pp. 249–254). IEEE.

  13. Garzia, F., Hussain, W., Airoldi, R., & Nurmi, J. (2009). A reconfigurable SoC tailored to software defined radio applications. In Proc of 27th Norchip Conference, Trondheim (NO).

  14. IEEE Standard for Information technology (2009). Local and metropolitan area networks– Specific requirements– Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 5: Enhancements for Higher Throughput,” in IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 as amended by IEEE Std 802.11k- 2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-2008, and IEEE Std 802.11w-2009), vol., no., pp.1-565, Oct. 29 2009 doi:10.1109/IEEESTD.2009.5307322.

  15. Rauwerda, G.K., Heysters, P.M., & Smit, G.J.M. (2008). Towards software defined radios using coarse-grained reconfigurable hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(1), 313.

    Article  Google Scholar 

  16. Cooley, J.W., & Tukey, J.W. (1965). An algorithm for the machine calculation of complex Fourier series. Math. Comp., 19, 297–301.

    Article  MathSciNet  MATH  Google Scholar 

  17. Hussain, W., Garzia, F., & Nurmi, J. (2010). Exploiting control management to accelerate radix-4 FFT on a reconfigurable platform. In Proc. International Symposium on System-on-Chip 2010. ISBN: 978-1-4244- 8276-4 (pp. 154–157). Tampere : IEEE.

    Chapter  Google Scholar 

  18. Kylliainen, J., Ahonen, T., & Nurmi, J. (2007). General-purpose embedded processor cores - the COFFEE RISC example. In J. Nurmi (ed.), Processor Design: System-on-Chip Computing for ASICs and FPGAs (ch. 5, pp. 83–100). Kluwer Academic Publishers / Springer Publishers. ISBN-10: 1402055293, ISBN-13: 978-1-4020-5529-4.

  19. Brunelli, C., Garzia, F., Giliberto, C., & Nurmi, J. (2008). A Dedicated DMA Logic Addressing a Time Multiplexed Memory to Reduce the Effects of the System Buss Bottleneck. In Proc. 18th International Conference on Field Programmable Logic and Applications, (FPL 2008) (pp. 487–490). Germany, Heidelberg.

  20. Garzia, F., Brunelli, C., & Nurmi, J. (2008). A pipelined infrastructure for the distribution of the configuration bitstream in a coarse-grain reconfigurable array. In Proceedings of the 4th International Workshop on Reconfigurable Communication-centric System-on-Chip (ReCoSoC’08). ISBN:978-84-691-3603-4 (pp. 188–191). Univ Montpellier II.

  21. Hussain, W., Garzia, T. Ahonen F., & Nurmi, J. (2011). Application-driven dimensioning of a coarse-grain reconfigurable array. In Proc. NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2011) (pp. 234–239). USA.

  22. Airoldi, R., Garzia, F., Anjum, O., & Nurmi, J. (2010). Homogeneous MPSoC as baseband signal processing engine for OFDM systems. International Symposium on System on Chip (SoC) 2010, 26–30. doi:10.1109/ISSOC.2010.5625562.

  23. Hussain, W., Ahonen, T., & Nurmi, J. (2012). Effects of scaling a coarse-grain reconfigurable array on power and energy consumption. In Proc. SoC 2012. Tampere.

  24. Bonnot, P., Lemonnier, F., Edelin, G., Gaillat, G., Ruch, O., & Gauget, P. Definition and SIMD implementation of a multi-processing architecture approach on FPGA. In Proc. of Design, Automation and Test in Europe (DATE ’08) (pp. 610–615). New York: ACM.

  25. Campi, F., Deledda, A., Pizzotti, M., Ciccarelli, L., Rolandi, P., Mucci, C., Lodi, A., Vitkovski, A., & Vanzolini, L. A dynamically adaptive DSP for heterogeneous reconfigurable platforms. In Proc. of Design Automation and Test in Europe (DATE ’07) (pp. 9–14). San Jose: EDA Consortium.

  26. Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., & Dutoit, D. Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications. In Proc. 49th Annual Design Automation Conference (DAC ’12) (pp. 1137–1142). New York: ACM.

  27. Voros, N.S., Hubner, M., Becker, J., Khnle, M., Thomaitiv, F., Grasset, A., Brelet, P., Bonnot, P., Campi, F., Schler, E., Sahlbach, H., Whitty, S., Ernst, R., Billich, E., Tischendorf, C., Heinkel, U., Ieromnimon, F., Kritharidis, D., Schneider, A., Knaeblein, J., & Putzke-Rming, W. (2013). MORPHEUS: A heterogeneous dynamically reconfigurable platform for designing highly complex embedded systems. ACM Transactions on Embedded Computing Systems, 12(Article 70, 3), 33.

    Google Scholar 

  28. Altera Product Catalog (2015). Release Date: July 2014, Version 15.0, p. 2, www.altera.com.

  29. Wu, X., & Gopalan, P. (2013). Xilinx Next Generation 28 nm FPGA Technology Overview, White Paper: 28nm Technology, July 23, 2013, Version 1.1.1, p. 5, www.xilinx.com.

  30. Ian, K., & Rose, J. (2007). Measuring the gap between FPGAs and ASICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 26(2), 203–215. doi:10.1109/TCAD.2006.884574.

    Article  Google Scholar 

Download references

Acknowledgment

This research work is jointly conducted by the Department of Electronics and Communications Engineering, Tampere University of Technology, Finland and the Department of Computer Science, University of Chicago, Illinois, USA. It was partially funded by the Academy of Finland under contract # 258506 (DEFT: Design of a Highly-parallel Heterogeneous MP-SoC Architecture for Future Wireless Technologies) and Tampere Doctoral Programme in Information Science and Engineering, Finland. The Department of Computer Science, University of Chicago, Illinois, USA also provided the financial and on-site resources for its implementation.

The authors sincerely acknowledge Bob Bartlett, Director of the Technical Staff at the Department of Computer Science, University of Chicago, IL, USA for his consistent training, support and making available the required computing infrastructure in a short time for the implementation of this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Waqar Hussain.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hussain, W., Airoldi, R., Hoffmann, H. et al. HARP2: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms. J Sign Process Syst 85, 341–353 (2016). https://doi.org/10.1007/s11265-015-1054-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-1054-9

Keywords

Navigation